Facebook explains how 'TAO' serves social workloads, data requests

Summary:Facebook’s engineering team outlines the design of TAO, the social network's graph data store.

zdnet-facebook-open-graph-tao

Facebook has dropped in yet another tile to the puzzle that is its intricate data infrastructure serving more than a billion users worldwide.

This time the focus is on TAO, or “The Associations and Objects, a core component of Facebook's data infrastructure that runs on a large collection of geographically-scattered server clusters.

See also: Understanding Unicorn: A deep dive into Facebook's Graph Search | Facebook translates natural language interface under Graph Search

Actually already several years into production, TAO is the underbelly of the implementation of most of Facebook’s core features.

To get a grasp of just how robust this system needs to be, TAO serves thousands of data types while handling more than a billion read requests and millions of write requests per second.

Facebook software engineer Mark Marchukov explained in a blog post on Tuesday how vast numbers of data sets on the world's largest social network are partitioned into hundreds of thousands of shards to make more efficient use of the server hardware.

There are two tiers of caching clusters in each geographical region. Clients talk to the first tier, called followers. If a cache miss occurs on the follower, the follower attempts to fill its cache from a second tier, called a leader. Leaders talk directly to a MySQL cluster in that region. All TAO writes go through followers to leaders. Caches are updated as the reply to a successful write propagates back down the chain of clusters. Leaders are responsible for maintaining cache consistency within a region. They also act as secondary caches, with an option to cache objects and associations in Flash. Last but not least, they provide an additional safety net to protect the persistent store during planned or unplanned outages.

Marchukov pointed out how this all needs to be accomplished in record time too as a "data set must be retrieved and rendered on the fly in a few hundred milliseconds."

Every time any one of over a billion active users visits Facebook through a desktop browser or on a mobile device, they are presented with hundreds of pieces of information from the social graph. Users see News Feed stories; comments, likes, and shares for those stories; photos and check-ins from their friends -- the list goes on. The high degree of output customization, combined with a high update rate of a typical user’s News Feed, makes it impossible to generate the views presented to users ahead of time.

The nitty-gritty details about the design and implantation of TAO as well as related APIs are available on the Facebook Engineering blog now.

Image via The Facebook Engineering Blog

Topics: Web development, Apps, Data Management, Social Enterprise, Software Development

About

Rachel King is a staff writer for CBS Interactive based in San Francisco, covering business and enterprise technology for ZDNet, CNET and SmartPlanet. She has previously worked for The Business Insider, FastCompany.com, CNN's San Francisco bureau and the U.S. Department of State. Rachel has also written for MainStreet.com, Irish Americ... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.