Facebook and Teradata on Apache Presto and the disruption of open source

In the nearly two years since going open source, Presto has grown from an internal Facebook project into a platform that's used by likes of Airbnb, Dropbox and Netflix to process data more rapidly.

presto2.jpg

These days Facebook is known for its refined approach to data, but once upon a time, the social network found itself slowly drowning in a sea of it.

"A couple years ago at Facebook, we had a massive amount of data but the tools we were using were not adequate," explained Jay Tang, head of interactive analytics infrastructure at Facebook.

"So we started to develop a brand new SQL engine to process data faster."

What Tang is describing is the birth of Presto, Facebook's SQL query engine designed for low-latency interactive data analysis.

Presto was built to be faster than Facebook's other Hadoop data query framework Hive and fulfills a similar role. Presto is still heavily used by Facebook, running tens of thousands of queries a day on data stores that scale up to 300 petabytes.

"And then we decided we wanted to open source it to build a community around it -- it is really a paradigm shift of how data technology is being developed," Tang said. "Now we see more and more organizations outside of the Valley using Presto."

In the nearly two years since going open source, Presto has gained substantial enterprise traction. Through a slew of industry partnerships and the brain trust of the open-source community, Presto has grown from an internal Facebook project into a platform that's used by likes of Airbnb, Dropbox and Netflix to process data more rapidly.

But the quest for total enterprise readiness is still a work in progress.

"I think we are getting pretty close, but I think in the next 12 months we will get to that Promised Land," said Justin Borgman, Teradata VP and founder of the Hadoop-focused startup Hadapt that Teradata acquired last summer.

Teradata officially put its weight behind Presto in July, and since then the company's involvement has centered around the development of certified BI tools that will make Presto work simply and seamlessly within an enterprise.

For instance, this morning Teradata released a set of new drivers that provide the connection and implementation protocol for transferring the query and result between the application and database.

The drivers are a small technical detail but a major step forward, said Borgman, as it's a combination of performance and function that will ultimately drive Presto's enterprise adoption rate forward.

"The performance aspect is already there -- that has been a key focus since the early days of Presto," Borgman said. "All we are trying to bring are these enterprise accessories for a typical enterprise customer."

Looking at the bigger picture, Tang says Presto is indicative of the new era of innovation sweeping the tech industry as a whole.

"If you look at all the key big data technology that is coming out if the high tech industry in the last five years, the vast majority came out of open source," Tang said. "Fifteen or 20 years ago everyone was buying proprietary systems and now there is the shift to open source by these big companies. It's really a cultural shift in the user community for these various tools."