Hortonworks CEO Rob Bearden: Beware the Hadoop fragmentation

Hortonworks' chief talks about the run on Hadoop distributions, an "open core" strategy and crunching the big data numbers.
Written by Larry Dignan, Contributor

Hortonworks CEO Rob Bearden has a simple mission: Grow the Hadoop and big data markets with a heavy dash of open source and then the company's financial health will follow.

Amid current Hadoop developments---is there any company NOT launching a distribution with some value added software?---Hortonworks stands out. Why? Hortonworks turns over its entire distribution to the Apache open source project.

Hortonworks, which was essentially hatched with the team inside Yahoo that popularized Hadoop in the first place, has also been on a bit of a tear this year. The company is expanding in Europe, rounding out its management team with new hires and has created a Hadoop distribution for Windows as a beta.

I caught up with Bearden, the former chief operating officer of both SpringSource and JBoss and an Oracle executive, to talk shop last week. Here's a look at the highlights from my chat:

Hortonworks' Bearden.

On the Hortonworks strategy, Bearden noted that the general idea was to develop Hadoop's functions and contribute them to Apache. The company is building its Hadoop distribution and contributing it 100 percent open source. "We're building directly in the core trunk, productizing the package, doing QA and releasing," he said. "It's not an open core model." When Bearden refers to open core he's referring to a trend in Hadoop distribution where there are open source components but with proprietary software included as value added. This open core method is being used by Pivotal/EMC, Cloudera and MapR.

Why wouldn't Hortonworks go open core? Bearden said the aim of Hortonworks is to grow the overall market pie for Hadoop. Besides, Hortonworks' revenue model revolves around support. By making its distribution 100 percent open source, it serves as a try before you buy support program. "I think it’s important that we make the market function at scale fast," said Bearden. Specifically, he wants to create an open enterprise data platform that will grow the big data pie.

Is Bearden worried about fracturing Hadoop? In a word yes. Bearden noted that IBM and EMC wouldn't mind splintering Hadoop. Why? Large enterprise IT players need to grab as much control of the new datasets (think big data) as possible. By grabbing more data under management, enterprise giants can sell more hardware, software and services. "It's important to keep fracture from this space," said Bearden. "The way to stop that fracture is to give enterprises what they want on an open platform." Bearden noted that the latest Hadoop distributions aren't aiming to fracture Hadoop directly, but splintering is "a side effect of what they want to do."

Characterizing the new Hadoop distributions is similar to picking a part a mashup. There are mixes of open software and proprietary. Ultimately, these mashed up Hadoop distributions could lead to lock-in since they aren't 100 percent open. Overall, Bearden said Hadoop will be fractured to some degree.

On support agreements, Bearden said that there's a significant majority of technology leaders that want support with their Hadoop distribution even though Hortonworks distro is on Apache.

What Hadoop can and can't do. Bearden said Hadoop is rock solid as an enterprise platform and storage layer for unstructured data. "It's reliable, predictable and stable," he said. "There's true reliability today for storage processing at scale." There needs to be more tools for complex data management, but Bearden expects that functionality to arrive over the next year. Where Hadoop visions differ is real time transaction processing. Bearden's take is that real time processing is many years away if ever. "I'd emphasize 'if ever,'" he said. "We don't view Hadoop being storage, processing of unstructured data and real time." Other companies behind distributions, notably Cloudera, see real-time processing as important. "Why recreate the wheel," asks Bearden. Although trying to upend the likes of IBM, Teradata, Oracle and other data warehousing players may be interesting, it's unlikely that a small fry could compete. "I'd rather have my distro adopted and integrated seamlessly into their environment," said Bearden. For instance, Hortonworks and Teradata have a tight integration partnership. "It's not a Lego exchange and connectors," said Bearden of Hortonworks' partnership with Teradata. "We can show the management of data at every later."

International expansion. Hortonworks recently expanded into Europe, Middle East and Africa and is building out its infrastructure. Other international moves will come slowly. "We have to get it right in North America first," said Bearden. "We'll get it right close to home and then set up the infrastructure to follow the sun."


Editorial standards