Facebook reveals ranking system behind searching posts via Graph Search

Accumulating hundreds of terabytes of data daily is a daunting challenge for any company -- even Facebook.


Facebook has had a number of notable ( and sometimes controversial ) alterations to its publishing and search abilities.

One that might not have garnered as much attention is the ability to search individual posts using Graph Search.

With one billion and counting new posts published to the world's largest social network each day, that equates to more than one trillion total posts being added to the index, churning out hundreds of terabytes of data.

Accumulating that much data on a daily basis (and then repurposing as useful information) is a daunting challenge for any company -- even Facebook.

But the Menlo Park, Calif.-based company is well known for building its own engineering systems from the ground up, with a growing datacenter presence worldwide, spanning from Oregon to Sweden.

Originally conceived as a hackathon project two years ago, Ashoat Tevosyan, an engineer on Facebook's search quality and ranking team, explained further in a blog post on Thursday that most search queries typically result in more, well, results than any user cares to navigate.

He also admitted that the general Facebook posts index is much larger than other search indexes on the platform. Thus, the objective has become to develop algorithms that can determine and rank which results should be deemed the most relevant.

To surface content that is valuable and relevant to the user, we use two primary techniques: query rewriting and dynamic result scoring. Query rewriting happens before the execution of the query, and involves tacking on optional clauses to search queries that bias the posts we retrieve towards results that we think will be more valuable to the user. Result scoring involves sorting and selecting documents based on a number of ranking "features," each of which is based on the information available in the document data. In total, we currently calculate well over a hundred distinct ranking features that are combined with a ranking model to find the best results.

Tevosyan noted that the engineering team must also closely monitor the heavy workloads bombarding datacenters given high amounts of traffic to the site via desktop and mobile channels.

Facebook has more than one billion users.

Thus, Facebook engineers have identified 70 different kinds of data for sorting and indexing, housed in a production MySQL database.

Tevosyan revealed that a few dozen engineers on the Graph Search team, and he also hinted at more search abilities added and modifications in the near future.