Facebook today explained how the link shim, a tool built by the company's Site Integrity Team in 2008, works to protect users. Every time a link is clicked on Facebook, the link shim checks that URL against the company's own internal list of malicious links, along with the lists of numerous external partners including McAfee, Google, Web of Trust, and Websense. The link shim was conceived and built by Facebook engineers Chris Putnam, Jordan Moncharmont, Clément Genzmer, Wanhong Xu, and "countless others."
If the link shim detects that a URL is malicious, Facebook displays an interstitial page (pictured above) before the browser actually requests the suspicious link. This page helps Facebook and its users in three different ways:
Spammy or malicious websites. Since the link shim checks URLs at click time as opposed to display time, Facebook can prevent users from accessing malicious content. In addition to its internal and external blacklists, Facebook uses advanced machine learning classifiers to check the authenticity of the sender as well as other inputs. Malicious URLs that have been sent over e-mail are also blocked since all links to non-facebook.com URLs in e-mail are rewritten to first go through the link shim.
Privacy and Identity. Since the URLs on Facebook themselves can sometimes contain private information (for example, going to facebook.com/profile.php can redirect you to your vanity URL, such as facebook.com/emil.protalinski), Facebook sends many of its own URLs through the link shim. This allows the company to hide personally-identifying information, such as whose profile you were on when you clicked a link, from third party sites.
External Analytics. When you're on an HTTPS page and click a link to an HTTP page, the browser doesn't send a referrer header. Since website owners understand how people find their site is by looking at the referrer header, a significant percent of clicks on Facebook would be incorrectly recorded by the destination site as being of unknown origin. Once again, the link shim steps in to always serve content (anonymously) over HTTP, so that the referrer still shows up as Facebook.
So, how does it actually work? The link shim is an endpoint accessible at facebook.com/l.php or facebook.com/l/, which takes two parameters: the redirect URL and a user-specific hash. The hash isn't strictly required, but it is necessary to avoid an open redirector security hole – essentially where someone would use the link shim to take advantage of the facebook.com URL for his or her own malicious purposes. Facebook thus generates a user-specific hash for each link shim URL so that when the person loads the interstitial page, the company can check that the hash is valid for him or her.
The hash parameter is also randomized to avoid external parties trying to identify which pages a given person accessed on their site based on their link shim hashes. In other words, one person has many different valid hashes at any given time, and is likely to get a unique hash for each click he or she makes. As such, it's pretty much impossible for an external site to determine who you are, or even if you're the same person that clicked on another link an hour ago.