Summary: Beating the rush of press releases likely to flood inboxes during next week's Semantic Technology Conference, Powerset today announced the public availability of a service that adds a whole new dimension to searching for information from Wikipedia.

Powerset logoBeating the rush of press releases likely to flood inboxes during next week's Semantic Technology Conference, Powerset today announced the public availability of a service that adds a whole new dimension to searching for information from Wikipedia.

Whilst much of the functionality unveiled today has been visible to those granted access to the company's Powerlabs for some time, the Powerset team has clearly been busy optimising code and ensuring that the various components work together much better.

Powerset places much store by their ability to 'read and extract meaning' from a user's query; and from the resources that they are searching. Although today's beta is targeted at articles from Wikipedia (and some of Wikipedia's facts stored in more structured form inside Freebase), Powerset's ambitions are obviously larger. The company that has been described, more than once, as a potential 'Google killer' is hardly likely to stop at a couple of million pages from a big online encyclopaedia. Indeed, Dan Farber's post over the weekend suggests that Powerset may not be the only company with loftier ambitions for this technology. Guidewire Group senior analyst, Carla Thompson, was clearly impressed, commenting;

"In many ways, [Powerset's application] defines a new search category altogether and brings the power of semantics to mass consumers. Once users have experienced the usability and deep relevance of Powerset-enabled Wikipedia search, I imagine they’ll start demanding this level of intelligence from all content channels."

Powerset co-founder Lorenzo Thione is also quoted in the company's press release;

"Our first product has only touched the surface of what our technology will allow. Our team of computational linguists, computer scientists and engineers, together with the PARC technology we licensed, has allowed us to develop a solid platform to begin to change the way people consume content."

Scott Prevost, Powerset's Director of Product, continued the theme;

"We have focused on making Powerset able to read and understand documents on the web as part of a broader vision to change the way people interact with technology. This first product will make people’s search experiences on Wikipedia and Freebase easier, more natural and more relevant."

The roots of Powerset's capabilities stretch back into computing Prehistory, with Powerset CTO Barney Pell painting a compelling picture of Xerox PARC 35 years ago. Alongside better known outputs such as the Graphical User Interface (GUI) and mouse, Pell points out that PARC alumni such as Ron Kaplan also postulated the value of a 'Conversational User Interface,' or CUI, and that this work evolved to become PARC's ongoing Natural Language Processing activity.

Pell argued that a lack of compute power has traditionally hindered the effective application of these natural language processing approaches to real-time problem solving, but he asserts that this has now changed. Speaking of his time at Mayfield prior to founding Powerset, Pell suggests that he was examining the broad trends driving search and came to the conclusion that,

"Moore's Law would suggest that computing power is now sufficient to do natural language search."

Pell founded Powerset in 2005, and the company has subsequently licensed (pdf) and enhanced some of that PARC research. Today's announcement sees some of the fruits of that work made available for all to use, and doubtless marks the beginning of the next stage of the Powerset journey as the team works to understand how their technology scales under real world conditions; and how the Web-using community unlearns their hard-won keyword search strategies.

Remembering early promises (and crushing disappointment) from that early example of a search engine that entreated us to communicate with it 'naturally', Ask Jeeves, I was keen to press Pell on the notion that millions of Web users will transform their search behaviour in order to reap the sorts of benefits that Powerset claims. He suggests that most users want their computers to be 'better companions,' understanding the user, their context and their requirements. He also asserts that

"Natural language technology is central to the future of Search."

"People have no problem forming natural language queries."

There will always, Pell suggests, be 'a value' in keyword-based search. The key benefit to Powerset does not seem to be offering users the ability to type their queries as sentences. Rather, it is the additional meaning that the system is then able to glean from the sentence and the way that it has been formed; meaning that enables something approaching understanding; understanding that then enables a more relevant answer to be returned.

Unlike some of the other natural language players in the market today, Powerset focusses upon 'linguistic' rather than 'ontological' knowledge; the application relies upon understanding the structure of language in preference to building a large database of terms and synonyms. This approach requires brute force computation, but has the benefit of enabling an application to answer queries such as "What did the FDA approve?" without needing to know what the FDA is, without needing to understand the detail of a government's approval processes, and without having to understand all of the drugs, foods and more that might comprise the answer.

Equally impressive, a vague query such as "what disease killed a politician?" sees the system deduce (from Freebase) that Benjamin Harris was a politician, work out (from WordNet) that influenza and pneumonia are diseases, and apply that linguistic knowledge to know that 'killed' and 'died' relate meaningfully. None of the key words in my search occur in the result snippet, nor the article.

In longer Wikipedia articles, Powerset leverages additional structure from Freebase and offers a number of user interface tweaks to make finding the answer you require more straightforward.

Digging into an article such as this one on Tom Cruise, for example, offers a tag cloud of key 'Factz' and a navigable breakdown of the article's structure. Asking 'does he have children' very quickly scrolls to the relevant section of the article and highlights the answer.

Powerset is clearly a powerful application, and judicious use of the tools certainly can lend sometimes remarkable insight into a large pool of factual data. I am probably not alone in sometimes struggling to ask the 'right' question to trigger various elements of Powerset magic, but would be the first to recognise that old-time keyword search crafters such as myself form an ever-smaller minority of Web users; those less programmed to think in terms of that Google search box may well adapt more quickly. I do suspect, though, that the profusion of boxes in which to type, drop-downs with which to fiddle, and transient interface elements to chase (or ponder the absence of, when the search wasn't 'right' to trigger them) will probably be toned down in future releases. When all the interface elements and my query were in synch, it was a compelling experience. But I often had to think harder than I'd like in order to work out which gizmo to click or type inside in order to move to the next stage in my navigation. Of course, with complex things going on under the hood we do have to expect a little more complexity than the vanilla search box when driving the thing. All in all, then, a great start... with plenty of opportunity to enhance and enrich still further, both with the existing data sets and by adding new ones.

With a $12.5 Million Series A round back in 2006 (pdf), back-of-the-envelope scribbles would suggest a Series B (or some other injection of capital) cannot be far away. Rumours of interest from Redmond certainly won't do that process any harm...

As with so many of these new entrants, we're left wondering whether the free shop window display is the point (presumably supported by the ubiquitous AdWords), or a carefully calculated loss-leader intended to drive corporate customers to pay for the application of these technologies to their own data morass. Only time, presumably, will tell. I, for one, would be fascinated to see the enterprise-friendly offspring of a Microsoft-driven marriage between FAST and Powerset.

