Peter Norvig has been at Google for a long time now -- he was, until recently, the Directory of Search Quality; the guy who made sure every time you submitted a query, you usually got what you wanted. He has since moved into the Director of Research role, but is taking time off right now to update his textbook: "Artificial Intelligence: A Modern Approach".
Peter recently sat down with Anand Rajaraman to discuss several things including search quality at Google -- for a great read, check out this article. Anand explains how Google's search algorithm consists of offline and online phases. That is, the time-consuming process of discovering then tagging webpages is done offline, and is obviously query independent, and an online phase that happens at the time of search.
The online, query-dependent phase appears to be made-to-order for machine learning algorithms. Tons of training data (both from usage and from the armies of "raters" employed by Google), and a manageable number of signals (200) -- these fit the supervised learning paradigm well, bringing into play an array of ML algorithms from simple regression methods to Support Vector Machines.
This setup is perfect for machine learning. Throw in those expert "raters" that Google pays to sift through search results, and you have machine learning just waiting to happen. Researchers at Google have reached the point where a machine learned model is equal to, or better than, the hand-crafted algorithm that currently sorts Google's giant index in real-time when a user enters a query.
So why isn't Google using this machine learning model for their search engine then? Well, Peter suggests that there are two reasons. The first is that those engineers who hand made the current algorithm don't think a machine could do better. The second, as Anand says, is more interesting. Google worries that machine-learned models may suffer "catastrophic errors on searches that look very different from the training data".
If they are indeed testing this model, I would be very nice to see it as a "search experiment". What do you think? Are Google's concerns about machine learning going to keep it from ever becoming the engine that drives search results on their search engine?