Greenplum is probably the best-kept secret in the analytic database world. While Pivotal was busily building its Cloud Foundry business, its Greenplum business has continued to thrive. The new release steps up the multi-workload capabilities that differentiate Greenplum. The next question is when will Pivotal launch a managed Greenplum cloud service?
When it was first spun out from EMC and VMware, Pivotal appeared to be a rather odd collection of businesses. The biggest parts of the business, including Cloud Foundry and the Greenplum analytic database, looked like the ultimate odd couple: a platform as a service cloud technology and an analytic database. We've always wondered about the synergies between the two areas and when or whether Pivotal would finally prove that the sum was greater than the parts. Yet the same was said when Dell used private equity to acquire EMC in an acquisition that piled up considerable debt. A year after Dell and EMC closed, the combined business is on a run rate to throw off four times as much cash flow compared to debt service.
Much the same could be said for the Greenplum side of the Pivotal business. Under Pivotal's watch, it has operated in the shadow of Cloud Foundry, which drew the attention and likely the lion's share of investment. When we last left it, Pivotal had retrenched from the Hadoop business - first discontinuing its own platform, and then operating a partnership with Hortonworks, only to get eclipsed by IBM.
So when we recently took a fresh look at Pivotal's Greenplum business, we were quite surprised. OK, we still haven't found the synergy with Cloud Foundry, but guess what? The Greenplum business is doing quite fine, thank you. We estimate that Greenplum is roughly a $100 million business and solidly profitable. And surprisingly, for a business founded over 15 years ago, we estimate that it's currently growing at mid-double digit annual rates. While the mother ship shined the spotlight on Cloud Foundry, the Greenplum installed base remained stubbornly loyal and continued putting more skin into the game.
Greenplum, another of the PostgreSQL databases, competes in the same market as Teradata, Exadata, and Redshift. Since Pivotal began open sourcing its product portfolio a couple years back, Greenplum has, in effect, competed with the Teradatas of the world but at more Hadoop-like prices. Like its data warehousing rivals, Greenplum has steadily expanded beyond traditional SQL; it was one of the first data warehouses to embrace MapReduce and supports machine learning through the open source Apache MADlib project that it leads. And like most of its rivals, it has also made the database more extensible, accommodating a variety of data types beyond traditional relational structured data. And it has the checkbox Spark connector -- something that is becoming the norm for analytic databases.
None of these features are necessarily unique, but when you combine the ability to scale, run highly complex SQL queries, and manage a variety of workloads, its lower price points compared to the likes of Oracle and Teradata have proven attractive.
The new release, Greenplum 5, is being announced today. It further extends the database with support of text, geospatial, and JSON data. Admittedly, this capability essentially keeps up with the Joneses, in that most of Greenplum's rivals are also becoming more extensible.
The multi-workload capability sets the stage for a related enhancement: the ability to manage mixed workloads and apply "CPU fencing" where specific compute resources can be dedicated to specific workload types. While most analytic databases perform workload management, the ability to balance compute- and data (IOPS)-intensive loads has traditionally been confined to top-of-the-line systems from Teradata and Oracle.
Of course, with YARN, Hadoop also handles mixed workloads -- so at first blush, one could ask what's so special there. But Hadoop's ability to optimally handle interactive, batch, and streaming workloads on different parts of the cluster remains a work in progress because YARN only allocates resources, and doesn't actively manage or optimize them..
For the new release, Greenplum has further tuned its query optimizer for highly complex sub-select operations and nested queries, and added the ability to convert correlated queries to more manageable join operations. That reinforces the fact that it competes with the Teradatas, not the Redshifts of the world. Hold that thought.
Finally, Greenplum 5 adds certifications for multiple cloud providers. It has already been available as an Infrastructure-as-a-Service (IaaS) offering in the AWS marketplace, where you can either bring your own license or use on-demand pricing. And Greenplum has also been available for sister company VMware vSphere and OpenStack for private cloud deployments. With the new release, Azure certification is added, with Google Cloud coming soon. Having multi-cloud capability will be critical as we expect that it will start becoming a front burner issue for enterprises as they ramp up cloud deployment; most will begin developing second-sourcing policies to avoid cloud vendor lock-in.
What's missing from Pivotal Greenplum is a managed cloud offering. While we don't expect Greenplum to compete with Redshift on the basis of scale of reach and its own positioning for highly complex analytics, a managed cloud offering would expand its addressable market significantly. This is the vehicle by which more enterprises will be able to take advantage of big data analytics.
Ultimately Pivotal Greenplum should step up to the plate with a managed public cloud Greenplum service. But for starters, why not bootstrap a managed private cloud offering using -- you guessed it -- the Cloud Foundry infrastructure? Now that would finally put to rest nagging questions from pesky analysts like yours truly on where the synergy between Cloud Foundry and Greenplum actually rests.