Google search PageRank excludes relevant Websites

UPDATE: Matt Cutts says Google is "improving our analysis of the link structure of the Web" and has "begun minimizing the impact of many Googlebombs. Now we will typically return commentary, discussions, and articles about the Googlebombs instead."

Is all now good in our Google PageRank determined Web world? Hardly.

Google continues to be typically vague about the wider ramifications of its cherrypicking of what it characterizes as "bad search results" or "relevance problems."

MOREOVER, at its core, PageRank remains fundamentally flawed, as I put forth below.

JANUARY 13, 2007: Matt Cutts, Google’s apparent external point man for the “search community,” has posted a Google 2007 weather forecast for Google’s all-powerful PageRank.

Cutts’ “infrastructure status” public report at his “personal blog” begins with a bit of secretive panache:

"I’ll briefly cover the things that I know are going on. The executive summary is that things are relatively quiet."

Cutts then explains how the “data push that updates PageRank” is underway. Despite Cutts’ seemingly detailed expose so that “hard-core data center watchers…don’t get confused,” he liberally includes “outsider” disclaimers such as “I believe we’ve changed that” and “I think we’re going to change the “filetype:” operator”and “I believe it’s probably due to that data push...”

Cutts describes his role at Google as head of “Google's Webspam team,” but disclaims Google responsibility for the Google infrastructure status report that he posts:

The views expressed on these pages are mine alone and not those of my employer.

Cutts’ titles his blog “Gadgets, Google & SEO.” Google issues he has addressed of late include Google “tips”(see “Does Google play fair?") and the “erotic Pretty Dumb Things blog Google search snafu” (“Is Google a public service?”).

Cutts’ 2007 weather forecast unequivocally states:

The main determinant of whether a url is in our main web index or in the supplemental index is PageRank.

In “Is Google a public service?” I put forth that many view Google as a public service and expect Google to operate as such and liken Google SERP ranking expectations to U.S. Government entitlement expectations.

With a public depending upon Google SERP rankings and subsequently Google PageRank, the import of Cutts’ Google forecast is significant.

Google touts its “breakthrough” PageRank philosophy as the reason why “users have come to trust Google as a source of objective information untainted”:

Traditional search engines rely heavily on how often a word appears on a web page. Google uses PageRank to examine the entire link structure of the web and determine which pages are most important. PageRank interprets a link from Page A to Page B as a vote for Page B by Page A. PageRank then assesses a page's importance by the number of votes it receives. PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. Important pages receive a higher PageRank and appear at the top of the search results.

Google garners more than 50% of searches performed in the U.S.

Google told me it is “possible that people equate the Google search box with their browser window. Google is often used as the gateway to the Internet” (see “Google Q & A on Google Zeitgeist: Exclusive”).

I have asked “Is Google 'The Internet'?

If Google indeed is “the gateway to the Internet,” for a large number of people, Google wields significant power in its determination of “important pages” to “appear at the top of the search results.”

Google espouses an untaintable, machine driven “perfection” of its search engine:

There is no human involvement or manipulation of results.

Google neglects to recognize its money-making ad serving purpose in its corporate mission statement (see “Google search kingdom: Benevolent or despotic?”); Google also neglects to recognize the human foundations of its PageRank formulation.

Two particular humans originated Google’s PageRank filter aimed at “organizing the world’s information,” Larry Page and Sergey Brin, co-founders. Contrary to Google’s lofty mission statement, however, PageRank does not insure that the “world’s information” is “universally accessible and useful.”

Why not? The PageRank concept and Google’s specific implementation result in arbitrary, pre-determined exclusions and/or low rankings of “relevant” Web pages within Google SERPs.

At its core, PageRank is fundamentally flawed. By requiring that Web pages have inbound links from third-party Web sites, the PageRank based algorithm may result in automatic exclusion of the most relevant pages for a given query simply because no other Web sites have linked to them.

Google’s PageRank = “I am linked to, therefore I am.”

Page’s and Brin’s core assumption that a Web page can not be the most “relevant” if no third-party Web site links to it is not a defensible position, philosophically or scientifically.

Google’s “sandbox” also may result in automatic exclusion of the most relevant pages for a given query. Google’s exclusionary sandbox rationale is based on arbitrary, human-derived notions of “aging.”

If Google was indeed a public service, its sandbox could theoretically be disallowed due to age discrimination.



