Google PageRank: Biased and fundamentally flawed?

Is the infamous Google PageRank anti-democratic? Jakob Nielsen says it like it is, not how people want to believe it is.
Written by Donna Bogatin, Contributor

Jakob Nielsen says it like it is, not how people want to believe it is. His “Search Engines as Leeches on the Web” post from earlier in the year is a must-read cutting-through-the-hype antidote to search engine (Google) worship:

Search engines extract too much of the Web's value, leaving too little for the websites that actually create the content. Liberation from search dependency is a strategic imperative for both websites and software vendors.

I worry that search engines are sucking out too much of the Web's value, acting as leeches on companies that create the very source materials the search engines index.

His most recent alert, “Participation Inequality” echoes what I have been saying at this Digital Micro-Markets Blog since last Spring: the Web 2.0 Social Web is plagued by what I call the “Social Freeloader” phenomenon, as I put for the in several posts: “Social freeloaders: Is there a collective wisdom and can the Web obtain it?”, “Digg’s 8 million ’social freeloaders’” and “So many video sharing sites, so few video uploaders”:

Wikipedia’s “small core community” that does the vast majority of the work reflects the extremely low ratio of contributing users to non-contributing users throughout the new social Web that relies on user contributions for its content.

From Wikipedia to de.licio.us, and from YouTube to Riya, both not-for-profit endeavors and purely commercial enterprises are staking their entire existence on user-generated content that is unreliable, inconsistent and difficult to come by.

The average YouTube user is watching the content, not generating it, “while more than 35 million videos are viewed daily, only 35,000 are uploaded” and at Riya photo search, “searchers outnumber the uploaders…20 to 1.”

Perhaps the social Web will come to be known for its freeloaders, rather than its uploaders.

I include what I identify as the “Social Freeloader” phenomenon as one of my “Web 2.0 Top Five Social Risks”:

SOCIAL FREELOADERS: User Generated Content Sustainability

All of the Web 2.0 Social Web properties which rely on users to contribute content are faced with the “Social Freeloaders” phenomenon.

As in the “real-world,” interactions within social communities on the Web are dominated by an extremely small, self-selected minority of active, vocal participants.

Currently YouTube is reporting a video upload to video view ratio of one-tenth of one percent. YouTube says it is “empowering” users to “become the broadcasters of tomorrow.” YouTube’s sustainability, however, is dependent upon generating a higher ratio of user broadcasters contributing today.

Nielsen’s  “Participation Inequality” puts forth:

All large-scale, multi-user communities and online social networks that rely on users to contribute content or build services share one property: most users don't participate very much. Often, they simply lurk in the background. In contrast, a tiny minority of users usually accounts for a disproportionately large amount of the content and other system activity. This phenomenon of participation inequality was first studied in depth by Will Hill in the early '90s, when he worked down the hall from me at Bell Communications Research.

Nielsen cites five “downsides” of participation inequality, one involves “Search”:

Search. Search engine results pages (SERP) are mainly sorted based on how many other sites link to each destination. When 0.1% of users do most of the linking, we risk having search relevance get ever more out of whack with what's useful for the remaining 99.9% of users.

Does participation inequality make Google’s infamous “PageRank” “out of whack with what's useful” for the overwhelming majority of users?

Google describes its PageRank:

The heart of our software is PageRank, a system for ranking web pages developed by our founders Larry Page and Sergey Brin at Stanford University. And while we have dozens of engineers working to improve every aspect of Google on a daily basis, PageRank continues to provide the basis for all of our web search tools.

PageRank Explained

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves "important" weigh more heavily and help to make other pages "important."

Important, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don't match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page's content (and the content of the pages linking to it) to determine if it's a good match for your query.

Google portrays its PageRank as reinforcing the “democratic” Web.

In reality, however, Google PageRank is little more than a Google created formula which ends up further entrenching Web pages it has deemed, in its infinite wisdom, to be “important.”

