Google in search of solution for 'web spam' problem

Why are users increasingly finding themselves sifting through a growing number of junk sites to locate what they were looking for?
Written by Tuan Nguyen, Contributor

Lately, it seems like Google's search engine has been racking up more complaints than precise search results.

There was a time where the term "googling" was synonymous with finding. But as some companies have figured out how to game Google's not-so-mysterious-anymore search algorithm, users have increasingly found themselves sifting through a growing number of junk sites to locate what they were looking for.

Many of these low quality junk sites, labeled as "webspam," are published by content farms such as Demand Media, which publishes the ubiquitous eHow learning pages. The sites are designed to appear higher up in Google's page rankings by taking full advantage of the search engine's penchant for certain keyword, phrase, and linking patterns that had once helped to fetch more satisfactory results compared to other search portals. Some sites outright plagiarize content in hopes of having searchers land on their page over the original source.

On TechCrunch, Vivek Wadhwa, the Director of Research at the Center for Entrepreneurship and Research Commercialization at Duke University, describes  how searching for information on Google is become ever more frustrating.

“Google has become a jungle: a tropical paradise for spammers and marketers. Almost every search takes you to websites that want you to click on links that make them money, or to sponsored sites that make Google money."

In a recent blog post, Google's principal search engineer Matt Cutts has acknowledged the discontent in cyberland and says that steps are being taken to prevent junk sites from cluttering up searches.

Here are some of the interventions he mentioned that were being integrated into the Google search engine:

  • A redesigned document-level classifier to keep spam from showing up higher in search results by detecting repetitive "spammy" words and phrases that are often automatically generated to drive more traffic.
  • Enhanced detection of hacked sites that have lead to last year's rise in spam sites.
  • Additional improvements that can better detect sites that copy content and sites with low levels of original content.

Some have suggested that Google’s algorithm be tweaked to favor sites that reliably publish higher quality content. Content farms, however, present a unique dilemma in that some of the content they produce don't always fall neatly into the category of "spam." For instance, Demand Media floods the internet with articles, videos and other content created by lowly paid authors that may or may not be providing useful and original material.

Also, implementing a favored sites modification action would likely tilt page rankings in favor of established media outlets over the little guy who just might have something of value to offer. And with search engine optimization turning into such a competitive sport, many dot coms now have a dedicated SEO team to help modify web pages to climb up Google's page rankings.

Which begs the question: Can programmed search algorithms still be trusted to help us do more finding and less searching?

This post was originally published on Smartplanet.com

Editorial standards