LinkedIn's 'answer' to big data problems: Pinot

Already being rolled out for internal product management teams, LinkedIn is also planning to eventually open source Pinot to the public.


Big data presents a huge opportunity in the tech community and is routinely as touted as much. But not so many are keen to admit the pitfalls and problems in harnessing that power.

LinkedIn is opening up about its own big data challenges through the unveiling of its new analytics engine.

Dubbed Pinot, the web-scale real-time analytics engine was designed for monitoring, managing, and utilizing massive quantities of big data generated by multiple products across LinkedIn's budding empire of professional social and digital publishing products.

The roots for Pinot started to sprout roughly two years ago as LinkedIn found itself running up against of wall of data-driven roadblocks. Once work on Pinot got started, it took platform builders roughly eight months before it could actually be consumed for internal product use.

Before cultivating Pinot in-house, LinkedIn's engineering team said it was using a cocktail of different generic storage systems from the likes of Oracle and distributed key-value storage system Project Voldemort.

LinkedIn engineer Praveen Neppalli Naga explained in a blog post that these weren't meeting the rapidly growing flood of big data being produced by a social network of more than 300 million members worldwide and counting.

Naga declared, "Pinot was born as an answer to our problems."

LinkedIn data has a lot of depth and each dimension requires special treatment. We needed to build custom compression techniques to fit every dimension, in order to get optimal scan speed tradeoff vs. memory consumed. For example, each one of our members can have hundreds of skills and representing them per event is difficult. Similarly, groups that members belong to and companies they follow are some of the dimensions difficult to represent per event. We built Pinot with this difficult to index data in mind, but will save the details of the compression techniques for future posts.

Pinot now stands as the flagship data infrastructure for products such as "Who's Viewed Your Profile" and others that demand frequent and instant complex queries.

Currently available for internal product management teams for crunching analytics on ads reporting and paid premium products such as company profile follows, LinkedIn is also planning to eventually open source Pinot to the public.

Hints and notes of open source can already be found in Pinot. Naga highlighted Pinot supports the Hadoop pipeline for bootstrapping and reconciliation as well as real-time data indexing from Kafka and Hadoop.

Image via LinkedIn