Mike Olson, Cloudera's chief strategy officer, positions the company as a major vendor of enterprise data platforms based on open-source innovation.
There were a few elephants in the room at the March 21-22 Cloudera Analyst Conference in San Francisco. But between a blanket "no comment" about IPO rumors and non-disclosure demands around cloud plans -- even whether such plans exist, or not -- Cloudera execs managed to dance around two of those elephants.
The third elephant was, of course, Hadoop, which seems to be going through the proverbial trough of disillusionment. Some are stoking fear, uncertainty and doubt about the future of Hadoop. Signs of the herd shifting the focus off Hadoop include Cloudera and O'Reilly changing the name of Strata + Hadoop World to Strata Data. Even open-source zealot Hortonworks has rebranded its Hadoop Summit as DataWorks Summit, reflecting that company's diversification into streaming data with its Apache NiFI-based Hortonworks DataFlow platform.
At the Cloudera Analyst Conference, Chief Strategy Officer Mike Olson said that he couldn't wait for the day when people would stop describing his company as "a Hadoop software distributor" mentioned in the same breath with Hortonworks and MapR. Instead, Olson positioned the company as a major vendor of enterprise data platforms based on open-source innovation.
MapReduce (which is facing away), HDFS and other Hadoop components are outnumbered by other next-generation, open-source data management technologies, Olson said, and he noted that there are some customers who are just using Cloudera's distributed and supported Apache Spark on top of Amazon S3, without using any components of Hadoop.
Cloudera has recast its messaging accordingly. Where years ago the company's platform diagrams detailed the many open source components inside (currently about 26), Cloudera now presents a simplified diagram of three use-case-focused deployment options (shown below), all of which are built on the same "unified" platform.
Assuming Cloudera is seeing similar results, it's experiencing far healthier growth than any of the traditional data-management vendors. Whether you call it Hadoop and Spark or use a markety euphemism like next-generation data platform, the upside customers want is open source innovation, distributed scalability and lower cost than traditional commercial software.
As for the complexity of deploying and running such a platform on premises, there's no getting around the fact that it's challenging - despite all the things that Cloudera does to knit together all those open-source components. I see the latest additions to the distribution, Kudu and the Data Science Workbench, as very positive developments that add yet more utility and value to the platform. But they also contribute to total system complexity and sprawl. We don't seem to be seeing any components being deprecated to simplify the total platform.
Deploying Cloudera's software in the cloud at least gives you agility and infrastructure flexibility. That's the big reason why cloud deployment is the fastest-growing part of Cloudera's business. If and when Cloudera starts offering its own cloud services, it would be able to offer hybrid deployment options that cloud-only providers, like Amazon (EMR) and Google (DataProc) can't offer. And almost every software vendor embracing the cloud path also talks up cross-cloud support and avoidance of lock-in as differentiators compared to cloud-only options.
I have no doubt that Cloudera can live up to its name and succeed in the cloud. But as we've also seen many times, the shift to the cloud can be disruptive to a company's on-premises offerings. I suspect that's why we're currently seeing introductions like the Data Science Workbench. It's a safe bet. If and when Cloudera truly goes cloud, and if and when it becomes a public company, things will change and change quickly.
Related Reading:
Google Cloud Invests In Data Services, Scales Business
Spark Gets Faster for Streaming Analytics
MapR Ambition: Next-Generation Application Platform