X
Business

Greenplum delivers fast petabyte-scale data loading

A few weeks I wrote about HP's embryonic datawarehousing effort and taking on Teradata, IBM, Oracle and others. Open source startup Greenplum, flush with $27 million in series C funding (for a total of $57 million in VC investment), is also taking on the established enterprise datawarehousing world.
Written by Dan Farber, Inactive

A few weeks I wrote about HP's embryonic datawarehousing effort and taking on Teradata, IBM, Oracle and others. Open source startup Greenplum, flush with $27 million in series C funding (for a total of $57 million in VC investment), is also taking on the established enterprise datawarehousing world.

The company announced general availability of its software, Greenplum Database 3.0 (G3), which is based on the open source Bizgres and PostgresSQL projects. Customers include Sun, Capgemini, AirTran, MLB.com, Frontier Airlines, Skype, iCrossing, Didit, VideoEgg and Comcast.

The company claims that its database allows enterprises to load and access large data sets 10 to 100 times faster than competitive solutions with its shared-nothing, parallel processing architecture running on commodity hardware. G3 offers petabyte-scale loading, which the company said loads data in excess of 4.5-terabytes per hour.

"It's intrinsically scalable," explained Luke Lonergan, Greenplum co-founder and CTO. "The load files are divided across nodes and loaded in parallel. As the data size grows, more CPUs are available to pull data in from the outside and converted by our engine in parallel. All the conversion is distributed out, so every CPU core is connected to outside and pulling in data."

G3 also includes embedded analytics, with native support for advanced parallel analytic functions, and support for external data streams, such as RSS feeds, Web pages and Web services. Other new enhancements include workload optimization, scan speed improvements and interoperability with major business intelligence analysis solutions.

Future enhancements include a distributed query dispatcher, point-in-time recovery, disaster recovery improvements, automated recovery, and embedded analytics working from major business intelligence solutions, Lonergan said.

With its open source, scale-out hardware approach, G3 database engine, funding and growing customer list, Greenplum should be able to put some heat on the incumbents, especially from a cost/performance standpoint.

Editorial standards