Facebook opens up about new infrastructure project: Apache Giraph

Summary:The engineering team cited that it is moving along with Apache Giraph because it scales "at an incredibly high rate."


Facebook has unveiled its version of Apache Giraph, touted to be the social network's next big infrastructure project.

Initially launched in 2012, Apache Giraph is an open source projected boasted to be able to unleash "the potential of structured datasets at a massive scale."

The engineering team added that it is moving along with Apache Giraph for analyzing Facebook's Social Graph because it scales "at an incredibly high rate."

For example, Facebook is touted to be able to cluster a monthly active user data set of one billion input vectors with 100 features into 10,000 centroids with k-means in less than 10 minutes per iteration.

Avery Ching, a software engineer at Facebook, explained further in a blog post that the team wanted "a programming framework to express a wide range of graph algorithms in a simple way and scale them to massive datasets."

We ended up choosing Giraph for several compelling reasons.  Giraph directly interfaces with our internal version of HDFS (since Giraph is written in Java) and talks directly to Hive.  Since Giraph runs as a MapReduce job, we can leverage our existing MapReduce (Corona) infrastructure stack with little operational overhead.   With respect to performance, at the time of testing Giraph was faster than the other frameworks - much faster than Hive.   Finally, Giraph’s graph-based API, inspired by Google’s Pregel and Leslie Valiant’s bulk synchronous parallel computing model, supports a wide array of graph applications in a way that is easy to understand.  Giraph also adds several useful features on top of the basic Pregel model that are beyond the scope of this article, including master computation and composable computation.

Giraph version 1.0.0 is already available to download through an Apache mirror.

Reps for the world's largest social network reiterated on Wednesday that graphs "are central to Facebook."

Facebook has stressed this for months, especially through a number of deep dive sessions with the media and engineering teams held at the company's Menlo Park headquarters.

For reference, the two main "graphs" are the Social Graph for people and their connections followed by the Open Graph, designed to enable developers to link objects in apps with user actions.

Chart via The Facebook Engineering Blog

Topics: Developer, Big Data, Data Management, Open Source, Social Enterprise


Rachel King is a staff writer for CBS Interactive based in San Francisco, covering business and enterprise technology for ZDNet, CNET and SmartPlanet. She has previously worked for The Business Insider, FastCompany.com, CNN's San Francisco bureau and the U.S. Department of State. Rachel has also written for MainStreet.com, Irish Americ... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.