Beyond the elephant in the room, Cloudera wants to talk to the business

Now offering specialized editions tailored for data scientists, data engineers, and BI users, what are the next steps that Cloudera will take to broaden its appeal to the enterprise? And how will it approach the cloud?
Written by Tony Baer (dbInsight), Contributor

Cloudera has had an enviable position among the Hadoop crowd, as the one with the earliest foothold, and still, the largest installed base. It's also the one with the deepest pockets courtesy of Intel's $740 million backing. The elephant in the room is that, as it maintains that it has enough in the bank to get it to profitability, there are unsubstantiated published rumors of an impending IPO.

It's starting to pivot from its engineering heritage as it's picking new fights. In a meeting with analysts this week, Cloudera continued the theme kicked off at Strata, where Mike Olson spouted, "If you want to do Data Science, don't be like Watson, be like Holmes." The genesis is Cloudera's perception that globally, it views itself going up against IBM, as opposed to its traditional elephant rivals. And while IBM is putting a bear hug around Apache Spark, Cloudera claims its base of 400 production Spark implementations across its customer base far exceeds the footprint for machine learning compared to Watson.

With a new marketing team in place, Cloudera is seeking to make the familiar transition of a product engineering company turning the corner towards targeting business decision makers. That's challenging given Hadoop's genesis: as open source project of which there are over two dozen of them in the Cloudera platform, the dialog has been getting bogged down in taming the zoo before elevating to solution.

Open source is a double-edged sword; on one hand, organizations implementing big data analytics very much want open source to avoid lock-in, yet they don't want the downside of taming these projects; that's the job for the Clouderas of the world.


Walking the Business Talk

In the analyst session, Cloudera talked the talk, and it featured customers from the ranks of financial services and digital business who are relying on the platform for enterprise-critical use cases, such as optimizing customer engagement and sorting risk. One of the customers -- a global financial services giant that was one of Cloudera's earliest, is standardizing on the platform as part of a long-term strategy to take out its tens of thousands of data warehouses.

Cloudera is not necessarily on a strategy to uproot Teradata, Oracle, or Netezza, but to its credit over the past year it reshaped its product to go where its customers are. You can still get Enterprise Data Hub, but you can also get subset editions tailored to data science and engineering; analytic database; and operational database workloads. That's one step toward abstracting the zoo animals.


Land and Expand

For Hadoop players (Cloudera included), the initial sale is probably not where you're going to make your money. The sales cycle is usually long, and with the expectation of open source and commodity technology, the prices are not going to be on the level of incumbent databases. The margins come when customers renew and grow their clusters.

One way to expand is to go up the stack toward solutions, but Cloudera is clearly not Oracle. Instead, the road for Cloudera travels through its core data management, governance, and security stack. Cloudera has been expanding its stickiness with additions like Kudu. This is an updatable data warehouse that supports analytics for fast-changing data where the overhead of working with HDFS and columnar file formats like Parquet causes too much latency. This provides further inroads to the BI audience.

But even there, there's limits. Unlike MapR, Cloudera doesn't believe that Hadoop processing belongs out on the edge for IoT use cases. But if edge computing is outside Cloudera's wheelhouse, it needs a strategy for meeting the Ciscos of the world on the periphery. Actually, the likeliest path is with Cisco frenemy (and Cloudera equity partner) Intel and its IoT solution partners.

There's still white space for Cloudera to address. For instance, Cloudera is using containers opportunistically, with Kubernetes acting as the orchestration engine for its new Data Science Workbench. Should it also steal a cue from MapR, which has extended its core platform with container support, expanding the audience from data engineers and BI end users to enterprise applications and the developer community?

What about data integration for data lakes? It provides data cataloging, but for now leaves data preparation, master data, and lifecycle management to third parties. At this point, the strategy makes sense as these are evolving markets, and data lake adoption among the client base is still at early stages. But as data lakes grow more commonplace among the installed base, wouldn't it be natural for Cloudera customers to expect that the footprint of the platform should encompass integration?

Cloudera has made some shrewd acquisitions that could also get customers asking for more. Gazzang, acquired in 2014, brought enterprise grade encryption key management that -- most importantly -- goes beyond HDFS encryption with external key storage. Sense.io, acquired a year ago, provided the core technology for the Data Science Workbench, covered last week. We're still awaiting the true payoff from the Xplain.io acquisition of a couple years back aimed at SQL query optimization, but if Cloudera gets its cloud strategy off the ground, Xplain.io could provide the IP for helping cloud customers put real dents in their AWS or Azure monthly bills.


Did we say cloud?

Cloudera customers that spoke before the analyst audience indicated that cloud was their long term vision. For now, there remain issues dealing with PII and similar sensitive data. All emphasized the need to control their own encryption, a pain point that Cloudera already addresses. Some anonymize the data while others enforce strict time limits for maintaining sensitive data off premises.

But the movement to cloud is akin to plate tectonics -- it's happening and it's not likely to abate anytime soon. Cloudera reported cloud adoption among its customers grew almost geometrically over the past year; we've heard similar from other Hadoop providers.

We've gone on record that Hadoop's complexity will require managed cloud solutions to grow the installed base, and we've forecast that by year end 2018, that greenfield implementations in the cloud would hit the inflection point of over 50%. Our take from listening to Cloudera's customers is maybe we're a bit aggressive in our predictions. But cloud is happening, and it will be hybrid -- on-premise deployments are not going away anytime soon.

That makes for an interesting context over how and where to deploy Hadoop in the cloud.

If your organization plans hybrid deployment on- and off-premises, you can go with the usual three suspects and get the same environment. Your IT team will have to actively manage the instances. But you get the same environment, and with Cloudera Enterprise (and Hortonworks Data Platform for that matter), you get data governance. The downside is that your IT organization must actively deploy, patch, and manage the cloud instances just like they do in the data center.

Alternatively, if your organization goes all-in for Hadoop in the cloud, there are the home court offerings of AWS, Azure, and Google Cloud that are fully managed -- eliminating headaches like patching. And there's a third choice if your needs are very specialized or project-driven: opt for a la carte machine learning and Spark services.

So the other elephant in the room is the opportunity awaiting Cloudera in the cloud: offering the best of both worlds with a managed service that has all the governance and security of its core platform. Will Cloudera take the bait?

Hadoop's creator looks at upcoming tech that will unlock big data

Editorial standards