Hortonworks results, Databricks release and more top the week's Big Data news

Financial results, an acquisition, strategic investments and new releases add up to a week where Big Data got serious

My colleagues here at ZDNet have been covering a lot of Big Data news this week. I wanted to recap, add a few items and take it all in.

The green elephant in the room
To start with, Hortonworks, a leading Hadoop distribution vendor, announced after-the-bell quarterly results on Wednesday that beat expectations. As reported by Rachel King, Hortonworks reported revenue of $30.7M, up 154% from the year-ago-quarter and beating Wall Street expectations of $23.29M. The company is still a loss-making venture, of course -- it reported a net loss $42.3M/$1 per share, or 80 cents per share on a non-GAAP basis, which was narrower than Wall Street expectations.

While it still would seem that Hortonworks is paying more to acquire a customer than it's getting back, the presumption is that gaining market share is key right now, and the dividends will come later. In order for that to work, Hadoop has to become more engrained in Enterprise infrastructure, and more relevant to Enterprise users. There are many signs that the industry gets that.

Dial 1010Data for Acquisition
One sign of a maturing industry, or sub-industry, is consolidation, and we had some of that this week. As reported by Larry Dignan, Advance/Newhouse, the parent company of Conde Nast, acquired New York City-based in-memory data warehouse vendor 1010data. 1010data was in many ways an old-school company, with the technology emanating from the founders' consulting work on Wall Street years ago. That's a good pedigree, in my opinion, as that environment forces a focus on reliability and profit.

It's a bit out of step with the venture-funded gold rush feeling in the Bay Area though. Even the company name, meant to sound similar to the plethora of US "dial-around" long distance companies that emerged in the 90s, whose access codes consist of seven digits starting with 1010, was old school. The less than high-flying purchase price of $500M coming from an old media company is old school too. But it sure seems like Advance/Newhouse will be in a good position to become more modern and, almost proverbially, "data-driven." And that's a sign that the Big Data thing is for real.

Informatica, please
Another facet of consolidation involves the acquisition of pure play vendors by bigger, more general enterprise software companies. The acquisitions of Spotfire and Jaspersoft by Tibco are good examples. Or, digging further back, the acquisition of Ascential Software and its DataStage product by IBM in 2005. After such a round of acquisitions, there are usually one or two pure plays that remain independent.

One such company is Informatica. While that company did go public all the way back in 1999, it announced this week that it's closed its deal to go private again. As Larry Dignan reported yesterday, Informatica structured the deal with strategic investments from both Salesforce and Microsoft.

As the former CTO of a Microsoft Gold Partner focused on data platform solutions, I must admit I did a double-take when I saw that Microsoft was an investor. Informatica has been a competitor of Microsoft's in the data realm for some time, going head-to-head against the SQL Server Integration Services component, and now against Azure Data Factory, which entered general availability only yesterday.

But maybe that demonstrates more alignment that enmity. Perhaps Microsoft is interested in Informatica's expertise and IP, including around data quality and master data management, two areas where Microsoft has products with little adoption, relative to many of their other data offerings. And since Informatica came out on top in the Gartner Magic Quadrant for Data Integration Tools that came out last week, the timing isn't bad either.

Amazon Aurora GAs
Microsoft wasn't the only company to GA a new cloud data product release. Amazon Web Services GA'd its Aurora database offering last week, configuring it as a fifth platform choice for its Relational Database Service (RDS), alongside MySQL, PostgreSQL, Oracle and Microsoft SQL Server. AWS Chief Evangelist Jeff Bar made the announcement in a blog post on July 27th.

Since Aurora is itself based on MySQL, making both of them platform options on RDS might confuse customers a bit. Nonetheless, Amazon says "...Aurora can deliver 5x the price-performance of a traditional relational database when run on the same class of hardware." For customers who still want the "national brand," the straight-up MySQL option remains. For others, Amazon has created some tooling that looks to make migration from MySQL pretty seamless.

Databricks, by brick
Databricks, the SaaS offering of Apache Spark from the company led by the project's creators, announced its 2.0 release this week. My colleague Toby Wolpe reported the details on Wednesday. This release brings a form of access control to the platform via the developer "notebooks" that it uses as its interface. Notebooks can be made private, or shared with specific users, rendering some effective access control on the data behind the notebook.

Other Databricks 2.0 features include support for the R programming language (no small thing there) and for multiple versions of Spark, allowing companies that have enough history with Spark to have dependencies on multiple versions to work in the Databricks cloud nonetheless.

That last feature isn't "sexy." And that's a good thing. Because when companies start prioritizing features for complex deployments (instead of, say, for press releases) then it's clear customers are driving requirements.

This was a good week.