Between the Lines

Larry Dignan, Andrew Nusca and Rachel King

Can Google's algorithm do subjective quality calls?

By | March 2, 2011, 6:29am PST

Summary: The fallout from Google’s algorithm change has become apparent. Will Google have to increasingly make editorial calls about site quality.

“When we try to address challenges like this (site quality) we try to do it in an algorithmic way. There may be one-off situations where for legal reasons or what have you we will intervene manually. But our fundamental approach is to take an algorithmic approach and try to solve it from a technology standpoint.”

Neal Mohan, vice president of product management at Google, speaking Feb. 28 at the Morgan Stanley Technology, Media and Telecommunications Conference.

It has been almost a week since Google flipped the switch on its algorithm, which is now designed to weed out low-quality and more useless Web sites. Sounds good in theory. As I noted before, however, the Google algorithm switch is a slippery slope.

How slippery?

The big questions in all of this boils down to the following:

  • What’s the unassailable definition of quality?
  • Is an algorithm capable of making a subjective decision (one man’s spam is another man’s good read)?
  • And do we trust Google to be judge and jury via an algorithm we know nothing about?

I’d argue that algorithms won’t be able to do subjective judgments well and that means Google will increasingly need to make more editorial calls. Ryan Singel at Wired noted that Cult of Mac’s traffic bounced back after Google obviously did something to give the site juice back. Is this the best way to go about this?

There will be more sites complaining about Google’s algorithm change and the search giant will probably make a few “one-off” exceptions. The inflection point comes when Google has to make multiple “one-off” calls. Ultimately, we’ve outsourced the quality call to Google.

Related:

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Larry Dignan is Editor in Chief of ZDNet and SmartPlanet as well as Editorial Director of ZDNet's sister site TechRepublic.

Disclosure

Larry Dignan

Larry Dignan has nothing to disclose. He doesn’t hold investments in the technology companies he covers.

Biography

Larry Dignan

Larry Dignan is Editor in Chief of ZDNet and SmartPlanet as well as Editorial Director of ZDNet's sister site TechRepublic. He was most recently Executive Editor of News and Blogs at ZDNet. Prior to that he was executive news editor at eWeek and news editor at Baseline. He also served as the East Coast news editor and finance editor at CNET News.com. Larry has covered the technology and financial services industry since 1995, publishing articles in WallStreetWeek.com, Inter@ctive Week, The New York Times, and Financial Planning magazine. He's a graduate of the Columbia School of Journalism and the University of Delaware.

For daily updates, follow Larry on Twitter.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?
8
Comments

Join the conversation!

Just In

The whole approach is backward.
WilErz 2nd Mar 2011
The whole approach of crawling the web and algorithmically identifying content is fundamentally backward. What we need is a 'Semantic Web' with verifiability, created by combining metadata from relevant authorities (for IP addresses, domain names, websites, business licences, residence registration, etc.), and supplemented with voluntary metadata that isn't fully verifiable (e.g. identifying the topic of an essay). Ideally, the whole system should be decentralised, with metadata about domain name ownership, for example, coming directly from domain name registrars.

I suspect Google are opposed to any substantial moves towards a 'web of data' akin to a collection of relational databases, because it would effectively make their one significantly profitable business irrelevant. With a wealth of semantic data to draw on, the need for giant data centres sending out web crawlers and running highly tuned search algorithms to identify content would largely disappear. If finding pages or data on the web were as easy as finding a book in a library (where information like topic, title, author, language, year of publication, place of publication, publisher, number of pages, etc. can all be used in a query), why would anyone bother using Google?
0 Votes
+ -
Now we're getting to the meat of the matter: Google's effort to define what is quality content is bringing to light the ongoing problem with much of the Internet. As a trained journalist with experience as an online editor, it took several years of classes and work experience to help develop my judgment on what is news - and, by the way, what is news to one community is garbage to another. I stand with the editors when I say that you cannot automate what we do. Perhaps a new rating system can be agreed upon where site visitors help make those decisions.
0 Votes
+ -
Any algorithm change is essentially a zero-sum game (or is that not a good assumption?) Perhaps this wouldn't be news if Google hadn't stated that the aim was to remove content farms, but had stated the converse case - i.e. to promote sites that do X,Y,Z better (with X,Y,Z being the things they say you should do in their webmaster advice pages)....

I'm sorry to hear that Mahalo (whoever they are) has lost rankings, but wouldn't it be interesting to balance this with which sites are the big winners from the change? (zero-sum game etc.)

Just thinking aloud...
0 Votes
+ -
Better but not perfect
guihombre 2nd Mar 2011
Seems to be better, but still not perfect.

Deep searches are always a pain due to these blended scraper sites (they clone and blend multiple feeds to avoid a duplicate penalty).

I can't find a good example, just now, but if you do the search:
["On the nose I get aromas of creamy vanilla, toasted oak, lemon, and some tropical fruit notes like papaya and guava"]

You'll see how automated these are. The guy posted the article on 24th Feb, and almost immediately it was cloned on sites like answersagency.com

That's not a big problem in itself, but it is when you deep search and these scaped clones appear above the real content site.
0 Votes
+ -
Just found another scraper
guihombre 2nd Mar 2011
Jeez, even the big guys are doing it, I just found yet another scraper site scraping content, this time one of the big guys. Bing scraping Google content and pretending to have a search algorithm! (Ducks).
0 Votes
+ -
Hey, Nice MS Spin!
John Zern 2nd Mar 2011
@guihombre
Wow! you managed to drag Microsoft into a Google story.

Nice Job! happy
You have to admit, there's a lot of content farms and niche blogs people create just to make money, associated content is one example, they used to pay people a dollar or two to write articles but 90% of it is crap! same with ehow, about.com etc...
@Hasam1991 about.com? I like it. I guess your experience is different. The two sections I've used have a ton of original content and good user interaction.
0 Votes
+ -
The whole approach is backward.
WilErz 2nd Mar 2011
The whole approach of crawling the web and algorithmically identifying content is fundamentally backward. What we need is a 'Semantic Web' with verifiability, created by combining metadata from relevant authorities (for IP addresses, domain names, websites, business licences, residence registration, etc.), and supplemented with voluntary metadata that isn't fully verifiable (e.g. identifying the topic of an essay). Ideally, the whole system should be decentralised, with metadata about domain name ownership, for example, coming directly from domain name registrars.

I suspect Google are opposed to any substantial moves towards a 'web of data' akin to a collection of relational databases, because it would effectively make their one significantly profitable business irrelevant. With a wealth of semantic data to draw on, the need for giant data centres sending out web crawlers and running highly tuned search algorithms to identify content would largely disappear. If finding pages or data on the web were as easy as finding a book in a library (where information like topic, title, author, language, year of publication, place of publication, publisher, number of pages, etc. can all be used in a query), why would anyone bother using Google?

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix