Facebook is building out its infrastructure so it can keep up with the 250,000 new users it adds each day. In an interview Jonathan Heiliger, vice president of technical operations at Facebook, talks about server purchasing strategy.
In a CIO Sessions interview with Dan Farber (see Dan's transcript), Heiliger acknowledged the scaling up challenge and said Facebook recently completed a server bake-off with the following three options:
After going through the RFP process, Facebook went with the second and third options. Like any commodity there's a little futures gamesmanship involved. Heiliger noted that Facebook is trying to lock in component price drops in the future. You can do that when you buy servers in bulk.
Among the other notable odds and ends:
Heiliger noted that Facebook sticks with three server types in a homogeneous environment to boost buying power. "We, with a couple of exceptions, have three sort of main server types and run a fairly homogeneous environment which allows us obviously to then consolidate our buying power, it allows us to plan further in the future because we can buy servers on the cabinet basis rather then on an individual basis," he said.
On managing data on Facebook, Heiliger said that the social site's data needs are different than what Google would see. He said:
Google has a tremendous amount of information that they index and archive and present to users but fundamentally but if you go to Google and type in a search for a “tiger” and I go to Google and type in a search for a “tiger” we’re going to see generally the same results so they’re presenting that same information to both of us. Facebook's a little different in that the context of our data is all social so when you look at your friends and their status updates and their photos and the notes they may have written you're going to see one set of data versus if I look at my friends. Those tend to be non-intersecting sets of data.
On scaling, Heiliger noted that the dynamic data has forced Facebook "to do a bunch of different things relative to caching and relative to federating all of that data up amongst thousands of different databases so that as a user requests all of that information we’re not one particular server every time for different data."