Microsoft researchers follow Web spam money trail

Using a homegrown tool called Fiddler, researchers at Microsoft have come up with a system to track the money that flows from big-name advertisers to search engine spammers.
Written by Ryan Naraine, Contributor

Using a homegrown tool called Fiddler, researchers at Microsoft have come up with a system to track the money that flows from big-name advertisers to search engine spammers.
The methodology, created by Microsoft Research in partnership with the University of California, Davis, has already uncovered a complex scheme where a small group using false doorway pages are able to profit by  redirecting traffic passed from search engines in one direction and then sending advertisements acquired from syndicators in the opposite direction.  (More at the New York Times).

According to a research paper released by Microsoft, a "five-layer, double-funnel model" can be used to pick apart the end-to-end redirection spam and analyze the layers to follow the money trail.

The five-layers (and findings) explained:

Layer #1 (Fake doorway sites) -- Doorway domains at Google's free Blogger (blogspot.com) site had an-order-of-magnitude higher spam appearances in top search results than other hosting domains in both benchmarks, and was responsible for about one in every four spam appearances (22% and 29% in the two benchmarks respectively, to be exact). In addition, at least three in every four unique blogspot URLs that appeared in top-50 results for commercial queries were spam (77% and 75%). The researchers also found that over 60% of unique .info URLs in search results investigated were spam, which was an-order-of-magnitude higher than the spam percentage number for .com URLs.

Layer #2 (Redirection domains) -- The researchers fond that the spammer domain topsearch10.com was behind over 1,000 spam appearances in both benchmarks, and the IP block where it resided hosted multiple major redirection domains that collectively were responsible for 22-25% of all spam appearances. The majority of the top redirection domains were syndication-based, serving text-based ads-portal pages.

Layer #3 (The aggregators) -- Two IP blocks ~ and ~ appeared to be responsible for funneling an overwhelmingly large percentage of spam-ads clickthrough traffic. In the study, the researchers collected over 100,000 spam ads that were associated with these two IP blocks, including many ads served by non-redirection spammers as well. These two IP blocks occupy the “bottleneck” of the spam double-funnel andmay prove to be the best layer for attacking the search spamproblem.

Layer #4 (The syndicators) -- The study found that a handful of ad syndicators appeared to serve as the middlemen for connecting advertisers with the majority of the spammers. In particular, the top-3 syndicators were involved in 59-68% of the spam-ads clickthrough redirection chains sampled. By serving ads on a large number of low-quality spam pages at potentially lower prices, these syndicators could become major competitors to mainstream advertising companies who serve some of the same advertisers’ ads on search-result pages and other high-quality,non-spam pages.

Layer #5 (The advertisers)  -- The study showed that even well-known websites' ads -- bizrate.com, shopping.com, dealtime.com, and shopzilla.com -- had a significant presence on spam pages. "Ultimately, it is advertisers' money that is funding the search spam industry, which is increasingly cluttering the web with low quality content and reducing web users' productivity. By exposing the end-to-end search  spamming activities, we hope to educate users not to click spam links and spam ads, and to encourage advertisers to scrutinize those syndicators and traffic affiliates who are profiting from spam traffic at the expense of the long-term health of the web," the researchers explained.

The project has been dubbed Strider Search Ranger and is the work of the research team at Microsoft that created the HoneyMonkey exploit detection system and URL Tracer, a system to track large-scale domain squatters.

Editorial standards