Pinterest logs 20 terabytes of new data each day

Summary:The budding social network wants to remind people that underneath all of the DIY wedding tips, it, too, is a big data company.


Judging a book by its cover (or a social network by its interface), Pinterest might look like a simple repository for images of frilly white ballgowns and endless vegan casserole recipes.

But the budding social media company wants to make it clear that underneath it all, it, too, is a big data company. 

And like many enterprise tech stalwarts, Pinterest has demonstrated an interest of its own in open source, especially Hadoop .

Pinterest data engineer Mohammad Shahangian outlined the digital scrapbook's data infrastructure in a blog post on Thursday morning, highlighting how the Hadoop backbone surfaces relevant content and keeps the pinning momentum going:

Hadoop enables us to put the most relevant and recent content in front of users through features such as Related Pins, Guided Search, and image processing. It also powers thousands of daily metrics and allows us to put every user-facing change through rigorous experimentation and analysis.

In order to build big data applications quickly, we have evolved our single cluster Hadoop infrastructure into a ubiquitous self-serving platform.

Acknowledging that Hadoop is not "plug-and-play technology," Shahangian described further how Pinterest engineers have employed "a wide range of home-brewed, open source and proprietary solutions to meet each requirement" in building a personalized discovery engine.

Here's a snapshot of just how much data is being generated through that engine powering Pinterest:

  • It logs 20 terabytes of new data daily
  • It stores approximately 10 petabytes of data in Amazon's Simple Storage Service (S3)
  • Pinterest has six standing Hadoop clusters comprised of over 3,000 nodes.
  • Developers generate more than 20 billion log messages and process nearly a petabyte of data with Hadoop each day.
  • Using the current Hadoop setup (while dabbling with managed Hadoop clusters as well), the platform requires over 100 regular MapReduce users, who, in turn, run more than 2,000 jobs daily via Qubole’s web interface, ad-hoc jobs and scheduled workflows.

Although the San Francisco-headquartered business hasn't revealed official user counts, reports say that the platform serves between 40 million and 60 million monthly active users and counting.

But Shahangian touted that there are more than 30 billion pins on the site to date.

Image via Pinterest

Topics: Big Data, Data Management, Social Enterprise, Start-Ups, Developer


Rachel King is a staff writer for CBS Interactive based in San Francisco, covering business and enterprise technology for ZDNet, CNET and SmartPlanet. She has previously worked for The Business Insider,, CNN's San Francisco bureau and the U.S. Department of State. Rachel has also written for, Irish Americ... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.