Amazon Redshift: ParAccel in, costly appliances out

Amazon Redshift: ParAccel in, costly appliances out

Summary: First Amazon invested in MPP vendor ParAccel. Now AWS uses ParAccel tech for Redshift. Is this the data warehouse disruption dream team?

SHARE:
TOPICS: Big Data
7

Back in July, Data Warehouse vendor ParAccel announced it had a new investor: Amazon.  Then yesterday, Amazon announced its new cloud Data Warehouse as a service offering: Redshift.  And, none too surprisingly, it turns out that Redshift is based on ParAccel’s technology.  I spoke to Rich Ghiossi and John Santaferraro, ParAccel’s VPs of Marketing and Solutions/Product Marketing, respectively, who explained some of the subtleties to me and helped me think through some others.

We don't need no stinkin' appliances
ParAccel takes a rather radical approach compared to other vendors in the Massively Parallel Processing (MPP) Data Warehouse category: the company designed its software to run on commodity hardware.  Most MPP vendors (including Teradata, HP/Vertica, IBM/Netezza, EMC/Greenplum and Microsoft) sell their products only in the form of an appliance either sell their MPP products or storage only in the form of appliances, or are owned by hardware or storage companies that may prefer to sell it that way.  Inside those MPP appliance cabinets, typically, lies a cluster of finely tuned server, storage and networking hardware,  It’s an optimized, high-performance approach to data warehousing.  It’s also expensive, and it keeps certain customers out.  ParAccel decouples MPP technology from expensive appliance hardware.

Down with false choices
Hadoop, of course, takes the commodity hardware approach as well.  And that likely accounts for its runaway popularity as a Big Data platform.  But MPP is big data technology too, as I’ve said many times before:

The problem with Hadoop, though, is that its native query mechanism is MapReduce code, rendering it incompatible with the massive product and skillset ecosystem around SQL.  Over the last several months, vendors such as Cloudera and Microsoft have sought a fusion of SQL and Hadoop.  Other vendors, like Rainstor and Hadapt, have been pursuing that fusion for a while.

Pure play
But why hybridize SQL with Hadoop, when MPP data warehouses that can handle Petabyte-scale big data workloads use SQL natively?  Chiefly, the reason has been because MPP carried the appliance barrier-to-entry, so you had to choose between SQL on an appliance and Hadoop on commodity hardware.  ParAccel smashed that dichotomy, but the company is still growing and so, for many, the dichotomy has stood.

But Amazon is attacking that dichotomy further, because now ParAccel-based, petabyte-scale MPP technology is elastic.  It’s available in the cloud, on-demand, running on a cluster sized according to your needs.  You don’t have the build the cluster; and you don’t have to provision the hardware.

Appliances only scale up to what’s inside them, and that may be a lot more than needed initially. As far as elasticity goes, that’s the worst of both worlds.  With Redshift, and these are Amazon’s own words, "Scaling a cluster to improve performance or increase capacity is simple and incurs no downtime."

This opens up all sorts of scenarios.  Amazon claims the cost of Redshift is under $1000 US per Terabyte, per year.  So many organizations could quite easily keep their core data warehouse in the cloud.  But Redshift seems to lend itself to ephemeral use too: why spin up an Elastic MapReduce Hadoop cluster to analyze your data when you can spin up an MPP data warehouse (that your existing BI tools can query) just as easily?

On-prem, and off
Of course $1000/TB/year that means you’ll be paying at least $1 million/year for a Petabyte data warehouse.  But when you factor in the hardware, storage, personnel/management, power and other costs of running such a large warehouse on premise, that ain’t so bad.  If you’re really working at Petabyte-scale, that number shouldn’t bother you.

Does that mean on-premise MPP data warehouses are passé? I wouldn’t say so.  First, there’s the issue of bandwidth restraints on data movement that I cited in my news piece on Redshift yesterday.  But second, the full on-premise ParAccel product includes features like On-Demand Integration Services, extensibility, user-defined functions, embedded analytics ans certain optimizations that Redshift doesn’t offer.

Shifting winds
This is definitely a case of "use the right tool for the right job."  But the appliance-shy, who have been trying to run their data warehouses on conventional, non-MPP relational databases and have found performance lacking, now have some choices, including the ability to try-before-they-buy by using Redshift in the cloud. 

And which conventional relational database might Amazon wish customers to "shift" that warehouse from?  Well there’s a big one that uses a lot of red in its logo.  Just sayin’.

Topic: Big Data

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

7 comments
Log in or register to join the discussion
  • pretty cool play ...

    But what about Hive? Plus it seems like every BI tool maker is building Hadoop connecters as well (though TBH they really only pull results, and don't launch MR jobs). Tableau, Karmasphere, etc.

    I'm of the mindset that ML and automation will radically alter the BI space (almost out of existence), but who knows.
    mobile_manny
    • Hive MPP

      Hive works over MapReduce, so it's still a batch system, and not a relational database. Hive is just a SQL interface to Hadoop. RedShift is an MPP data warehouse.
      andrewbrust
  • Red >>> Shift.

    Well there’s a big one that uses a lot of red in its logo. Just sayin’.

    Does the name exemplify this desired shift for Amazon?
    sreesiv
  • Congratulations and Correction

    Congratulations for ParAccel they have some great guys there so I'm very pleased for them.. You are however factually incorrect regarding HP/Vertica - this also installs on commodity hardware and has reference customers that run on Amazon EC2.
    htilabs
  • Big Data Warehouses in the Cloud

    Amazon’s Redshift announcement validates that enterprises are ready for cloud-based big data warehousing solutions. XtremeData, also available on Amazon as well as other clouds, is targeted for organizations that need a massively scalable DBMS solution for mixed read and write workloads, for example, with serious ELT. Redshift (a column-store licensed from ParAccel) is well-suited for read-only data marts of all sizes. The market is rapidly moving to a tipping point where the specialized solutions available on premise are becoming available on the cloud, Amazon and others.
    mlamble
  • Vertica correction

    >> Most MPP vendors (including Teradata, HP/Vertica, IBM/Netezza, EMC/Greenplum and Microsoft) sell their products only in the form of an appliance.

    Not true. Vertica runs on commodity x86 hardware. Also, btw, offers true MPP scale out - no leader node, unlike Paraccel (Redshift).
    Paul Gupta
    • Vertica, Greenplum are not appliance-only

      Thank you for pointing this out. I have updated the text to be less categorical on that point.
      andrewbrust