Peter Norvig on Google's mistrust of machine learning

Peter Norvig on Google's mistrust of machine learning

Summary: Peter Norvig has been at Google for a long time now -- he was, until recently, the Directory of Search Quality; the guy who made sure every time you submitted a query, you usually got what you wanted. He has since moved into the Director of Research role, but is taking time off right now to update his textbook: "Artificial Intelligence: A Modern Approach".

SHARE:
TOPICS: Google
2

Peter Norvig has been at Google for a long time now -- he was, until recently, the Directory of Search Quality; the guy who made sure every time you submitted a query, you usually got what you wanted. He has since moved into the Director of Research role, but is taking time off right now to update his textbook: "Artificial Intelligence: A Modern Approach".

Peter recently sat down with Anand Rajaraman to discuss several things including search quality at Google -- for a great read, check out this article. Anand explains how Google's search algorithm consists of offline and online phases. That is, the time-consuming process of discovering then tagging webpages is done offline, and is obviously query independent, and an online phase that happens at the time of search.

The online, query-dependent phase appears to be made-to-order for machine learning algorithms. Tons of training data (both from usage and from the armies of "raters" employed by Google), and a manageable number of signals (200) -- these fit the supervised learning paradigm well, bringing into play an array of ML algorithms from simple regression methods to Support Vector Machines.

This setup is perfect for machine learning. Throw in those expert "raters" that Google pays to sift through search results, and you have machine learning just waiting to happen. Researchers at Google have reached the point where a machine learned model is equal to, or better than, the hand-crafted algorithm that currently sorts Google's giant index in real-time when a user enters a query.

So why isn't Google using this machine learning model for their search engine then? Well, Peter suggests that there are two reasons. The first is that those engineers who hand made the current algorithm don't think a machine could do better. The second, as Anand says, is more interesting. Google worries that machine-learned models may suffer "catastrophic errors on searches that look very different from the training data".

If they are indeed testing this model, I would be very nice to see it as a "search experiment". What do you think? Are Google's concerns about machine learning going to keep it from ever becoming the engine that drives search results on their search engine?

Topic: Google

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

2 comments
Log in or register to join the discussion
  • I can understand Peter Norvig's (or Google's)

    reluctance - search is [b]Google[/b]'s core operation, and ?catastrophic errors on searches? are not something they could afford. Still, it would be fascinating if [b]Google[/b] were, after appropriate testing among volunteer guinea pigs (I offer myself as one), release a [b]Google[/b] ?Beta? using a ?machine-learned? model, along the lines of the [b]YouTube beta[/b] which users of that service may choose. But, as mentioned above, search is far more central to [b]Google[/b] than [b]YouTube[/b], important as the latter may be....

    Henri
    mhenriday
  • RE: Peter Norvig on Google's mistrust of machine learning

    Training data-vs-real data mismatch is partly what created the recent housing crisis, so I'd say Google is appropriately cautious. (There was recently a cool This American Life episode delving into the root causes of the crisis -- as always, freely available at thislife.org ).
    dal2010@...