Would you trust Google to decide what is fact and what is not?

A Google research project is looking at ways to rank pages based on the accuracy of facts on each page. Could this ignite a firestorm of political and religious disagreement?
Written by David Gewirtz, Senior Contributing Editor

I live in Central Florida. This place has an odd mix of very conservative and very liberal residents, which means that conversations at social gatherings can degenerate into partisan rhetoric incredibly quickly. I'm not a fan of parties, political or social.

If you separate out your own biases, watching people spout their opinions on what they consider concrete facts can prove to be illuminating (if disturbing).

A single mention of the president will, within a short time, lead to an argument about whether he was born in Hawaii or Kenya. A comment about the nice weather we're having will, within minutes, lead to a disagreement about whether we're experiencing global warming, climate change or nothing at all. A mention of a doctor's visit will degenerate into a heated discussion about the merits or perils of vaccines. And on and on and on.

It's like we live in parallel universes, where the facts that are "truth" to one group living in one parallel universe are completely different from the facts that are "truth" to another group living in an alternate parallel universe.

Frankly, I'm convinced all my neighbors (and just about everyone in Florida) is insane, but that's just my personal representational system for dealing with living in a location populated by individuals with strong opinions.

The contentious nature of facts

Facts are mutable. To a scientist like me, even making that statement causes my soul pain, but if you gather in a room filled with Floridians (or other groups of contentious Americans), the truthiness of that statement soon becomes apparent.

Science has ways of proving or disproving information to create a body of knowledge we call "fact." Many people, however, are completely immune to the reality as observed by science.

This whole discussion is politically fraught. For example, if I were to say "to rationally-minded people," the science-minded folks would self-identify. But so would the birther crowd, the anti-global warming crowd, and even the anti-vaccers.

On one hand, some of us might look at the others of us and consider them full-goose bozo. Certainly that seems the case with the anti-vaccers, who seem determined for some reason to bring the measles virus back from the dead.

But here's the thing. While medical science in America is very capable, we also know we can't fully trust the medical community. Report after report shows us stories of greed, corruption, and mistakes made by doctors, insurers, hospitals.

We also can't fully trust any statements about "fact" made by our government because most are carefully crafted statements created by spin doctors designed to support one agenda or another.

The point is, while you and I might agree upon some facts, and while some of us may see those facts as indisputable, large bodies of other individuals won't. In fact, while one group of us may think what another group believes to be lunacy, the reverse is also often true.

Google's Knowledge-Based Trust

I bring up this long discussion about facts because of a very interesting research paper from Google, "Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources" (PDF).

In it, Google researchers Xin Luna Dong, Evgeniy Gabrilovich, Kevin Murphy, Van Dang, Wilko Horn, Camillo Lugaresi, Shaohua Sun, and Wei Zhang discuss the value differences between exogenous and endogenous signals when ranking Web pages.

Google has long used exogenous signals to determine page ranking. Exogenous means "relating to external factors," and as we all know, Google has long used the links from incoming links from other Web pages to determine page rank.

There are, of course, flaws with this approach. First, although Google has done its best, the entire SEO industry exists to try to game the signals that raise page rank. Second, while Google gives primacy to popular pages, it doesn't raise up lesser known pages that might, in fact, contain better, higher-quality information.

This team of Google researchers is trying to improve all that. They're using Google's Knowledge Vault, another experimental technology -- in this case a knowledge base of billions of facts automatically gathered from Web pages around the Internet (paywall PDF).

The idea behind the Knowledge-Based Trust project is to establish a trustworthiness rating for individual Web pages based -- not on popularity -- but on the actual factualness of the information presented. This trustworthiness rating would be another signal fed into Google's ranking system to determine which pages show higher in search results.

This is a powerful idea with a great deal of potential. Rather than Web developers gaming the system with messy SEO schemes, pages containing good, solid information would be elevated in the search rankings, providing searchers with much more useful and correct information.

The Web as echo chamber

It also has the potential of reducing the echo-chamber nature of the Web, where one popular post is picked up and repeated over and over, until the Internet as a whole considers whatever that post originally said to be fact, even though no one can actually cite the originating source of whatever assertion is being presented.

There are, of course, obvious problems -- especially where belief intersects fact. Let's take the example the researchers bravely used: President Obama's country of birth. While there is overwhelming evidence that the president was born in the United States, there are still those out there who believe he was born in Kenya. The president himself, not particularly helpfully, even used the Kenya story as the punchline of a joke at the Gridiron Club.

In their study, Google's researchers weighted pages containing falsehoods (like Kenya as Obama's birthplace) as lower in terms of trustworthiness compared to pages containing facts. This would serve to raise those pages with greater accuracy in our searching results over those that contained falsehoods, facts that the Knowledge Vault considered incorrect.

The risks of a central facts authority

Of course, there are some very serious issues with this approach. The first and most obvious is that not everyone believes the same facts -- and for some topics (like religion, politics, and whether there's any even marginally justifiable reason for vanilla to exist when we have chocolate), there's really no way to make a case everyone will accept.

The second concern is the article I linked to above in OnPolitics. That article reported on the president's speech to the Gridiron Club, but stated that the president joked about Kenya as his birthplace. Because this page contains a fact that the Knowledge Vault considered wrong, it would be pushed down in the results, but it's actually newsworthy, because the president himself was the one riffing on the incorrect fact. There's no guarantee that a fact-based search result would understand and factor for such nuance.

Here's another example. Let's say that as a writer, I'm writing an article about the mistaken factoids upon which the climate-change deniers are basing their beliefs. I would need to be able to search for that information and not have Google, thinking it's doing me a favor, only surface articles it thinks are more accurate. Its search for accuracy might be directly at odds with the goals of my research, simply because, once again, it doesn't understand the nuance or reasoning of the search I might be undertaking.

So the question becomes whether most individuals across all socioeconomic and educational groupings would trust Google to serve up Web pages based on its algorithmic interpretation of facts. Speaking as a computer scientist with an advanced degree, I'm more likely to trust Google, because my background and education is quite similar to many at Google. Many studies have shown that those with similar educational backgrounds tend to share belief systems to at least some degree.

But I'm not representative of all America. And more to the point, what happens if Google itself starts to show an agenda in the facts it surfaces? This is big stuff because, as we know, school systems have been going to battle for years over whether evolution is taught in schools along with creationism. While I firmly believe in evolution, as supported by strong and (to most scientists) indisputable scientific evidence, there are still a lot of Americans who would disagree.

There is no doubt that Google has tremendous power in terms of what we read and find when we're searching for answers. But is there a potentially dangerous Orwellian component to Google using its interpretation of truth to weight Web pages, or is it a brilliant method that will improve search result quality for all of us?

I also question whether this is a battle Google wants to get into. Once Americans start battling over belief systems, all semblance of rationality and fair play seems to go by the wayside. If Google starts delivering search results based on its determination of truth, will it find itself in the middle of a holy war?

Google's researchers themselves acknowledge that their work has a long way to go before it is ready for prime time. I encourage them to continue the work, because better search results benefit all of us. But I also encourage them to keep in mind the eventual conflicts over belief systems and explore how it might be necessary to weigh both facts and widely-held beliefs and somehow balance the two.

After all, a search engine is not a tool for determining the truth. It's a tool for finding out what a wide variety of people are saying, some of which may be truth, some of which may be completely nutso-bozo, but all of it is the Web. We searchers need to see both what we might agree with and what we might dislike. It's a big, wonderful, insane world out there and we should be able to wallow in all of it, in order to find the personal truths each and every one of us seeks.

Editorial standards