X
Business

Proof that the search for "great search" isn't over just yet

Anybody who has ever used a search site like Google or Yahoo knows that there's room for improvement in search. But just how much?
Written by David Berlind, Inactive

Anybody who has ever used a search site like Google or Yahoo knows that there's room for improvement in search. But just how much? Is the room that's left only for incremental enhancements or might we still see some quantum leaps? Until last week when I got a demonstration of the work being done in Sun's Labs, I was thinking "incremental."  But now, I'm thinking quantum leap. And the really strange thing is that Sun isn't even thought of as a search company. After all, when you think about quantum leaps in search, the expectation is that if any one can deliver such a "leap," then that anyone is probably Google.  But Sun? If you told me before last week that Sun would be hatching some cool search technologies in its labs, I would have declared it bunk. But I would have been wrong. Dead wrong.

It's not like Sun wasn't already shipping search technologies. As it turns out, Sun ships a search engine with its portal server and Web server products. Now, in its Labs, Sun is trying to figure out how to take its existing search products to the next level. According to Sun's research, 85 percent of the information produced by businesses is of the sort that inspired search engines like Google's and Yahoo's in the first place: unstructured information. But the other key bit of data that Sun cites is how, on average, "information workers" spend 25 percent of their time looking for information. In other words, searching.

So, quality search isn't just about which search giant can win over the most Internet users. To businesses with a lot of information workers, any technological advancements that can whittle that 25 percent figure down to 20, 15, 10 or even 5 percent means that respectively, those workers can be spending 5, 10, 15 or event 20 percent more of their time on tasks that contribute more directly (more directly than searching) to competitive advantage. In fact, freeing up time to focus on those activities that contribute to competitive advantage -- long-hand for what I'm going to start calling "competitive productivity" (versus plain ole' "productivity")-- is more in-vogue than it has ever been. The subject routinely comes up in the context of outsourcing where the benefits of passing-off commodity tasks and automation (ones that run little chance of differentiating one business from the next) to "specialists" that make a business out of providing certain deliverables makes far more sense then insourcing the provision of those same deliverables.

For example, for most companies, using the browser-based Salesforce.com (or something like it) and letting its CEO Marc Benioff worry about system reliability, software updates, and security makes far more sense then bearing the burden of those headaches yourself. And, although there are some studies (often commissioned by the providers of insourced solutions) that try to demonstrate how outsourcing to application service providers like Salesforce.com might cost more in the long run, the question really comes back to where the majority of your employees' time should be spent: On things like running salesforce automation or human resource management systems that ultimately don't serve as major differentiators from one business to the next? Or, on the things that are clear opportunities to differentiate. Automated things aren't the only opportunities for automation. If you haven't seen Mechanical Turk -- Amazon's service for matching buyers and providers of things more manual, give it a try. 

So, there's no question in my mind that, if you can focus more of your information workers' time on competitive productivity, you should have an easier time achieving your business' goals. And that's apparently where Sun's head is at when it comes to search.  If I had to boil down what I saw during Sun's presentation, the advancements would fall into two categories that can't necessarily be disassociated with each other. The first of these is relevance of the results. The second is user interface. We've all seen the sorts of results that Google and Yahoo spit back at you when you visit either site to search for something. The problem, from a productivity point of view, is that there are so many results to weed through and they're not necessarily organized in a fashion that make them very navigable. Suppose for example the information you were looking for is actually there in the search results, but buried 10 pages down. Just clicking through and scanning that many pages alone is unproductive (competitively unproductive too).

The science of search needs to focus on how to get the relevant information closer to the surface. Today, the "surface" is often thought of as being the first page of search results. So, not surprisingly, a lot of search science is dedicated to relevance. In other words, how to make sure the most relevant links appear at the top, on the first page. The folks at Sun's Labs are clearly concerned with relevancy. According to Sun's Steve Green:

Google gets you a list of documents. We look for the answer to a question in a document. With Google, you're guessing what words someone would have used in a document to get the documents that you're looking for. Some people are good guessers. Some are not. In my house, I'm the better guesser.

It's so true. How many times have you seen someone else searching in futility for something, trying all sorts of search terms, and then you walk up and find it on the first try? Most people who can do this know something about how the answer being sought is often framed, and how search engines think. In some cases, they're experts on search syntax. For example, with most search engines, using the minus sign (or hypen) before a search term instructs the search engine to exlude documents that mention that term.  

In an effort to help users "competitively produce" an answer, Sun's Labs have come up with a lingusitics analysis technology that the company's researchers affectionately refer to as the "Blurbalyzer" and the example given cited the way Amazon recommends books on its Web site today. If you've ever shopped on Amazon, then, you've probably seen the feature that says "Customers who bought this item also bought" along with a list of items. The feature seems to suggest that, theoretically, if you liked The DaVinci Code by Dan Brown, then, based on what other buyers of the DaVinci Code also bought, you might want to read Michael Baigent's Holy Blood Holy Grail. This search feature relies heavily on the social nature of the data that Amazon keeps in its database. You're essentially relying on what others have done and in some ways, the feature can't help but self-perpetuate its own recommendations.

If for example, it suggests Holy Blood Holy Grail to buyers of The DaVinci Code and those customers act on that recommendation which in turn helps Holy Blood Holy Grail stay on the list of recommended reading, does that mean you'll like the book just because you liked The DaVinci Code? It's hard to say, but Sun thinks there's a better way based on a linguistic analysis of the reviews that people have written for the The DaVinci Code and other books. Copyright prevents Sun from digitizing, storing, and surfacing the full text of a book, but reviews are already in a digital format on the Web that Blurbalyzer can probe. In the demonstration, after digesting reviews of the DaVinci Code and comparing it with the language found in other reviews on Amazon's site,, Blurbalyzer turned up Paul Christopher's Michelangelo's Notebook. Reviews for both The DaVinci Code and Michelangelo's Notebook had similar references to religious mystery with ties to old Europe, The Louvre, symboligists, and the Priory of Scion. Yet neither book appears on Amazon's list of recommended reading for the other.

So far so good? Right. But wait, there's more. Maybe something like Blurbalyzer can discover relevant information based on its linguistics approach. But is that what you really want? To be fair, other approaches to relevancy may actually return better results in certain contexts. At some point, the end-user always ends up making the final decision about what's most relevant. In the name of competitive productivity, it's up to the search experience to make end-users as productive as possible...helping them to find that proverbial straight line that's the shortest distance between two points. Enter clusters (and eventually, the aforementioned "surface").

In the world of search, clustering is a technique that keeps similar items in close proximity to each other. If, prior to searching for something, a search engine has an idea of what documents are similar to each other in terms of their content, it can "cluster" them. Once clustered, a hit on any one document in a cluster could very well turn up the others. But now comes the surface problem. Let's say, because of its content, a single document ends up being a part of 8 distinctly different clusters. Most people doing search wouldn't get to see those clusters. In other words, they cannot visualize the eight clusters and then pick which one is closest to the vein of information they're seeking.  So, regardless of what technique (linguistics or otherwise) is used to drive the formation of these clusters, there still exists the problem of surfacing that structure (one that has been wrapped around what is mostly unstructured data).

Unless of course, you change the nature of the surface. For example, instead of a list of text, how about a three-dimensional translucent sphere (something similar to this crappy photograph that I took of one while it was being displayed on a projection screen):


In this photo, it doesn't look very 3D-esque. But trust me, on the screen, it had a great 3D feel and, if I recall correctly, the sphere could be "grabbed" and rotated.

More importantly, the sphere changes our notion of what the surface is. With this visual respresentation, we can see each of the clusters (imagine each bubble as an individual search result and the colors representing clusters). Then, imagine how, as the mouse passes over each bubble, something pops up like a book or a CD cover. On relatively short order, with a few mouse-overs, the end-user could easily get a sense of what each cluster is about and then zero in ont that cluster. It's sort of like being able to advance to the 20th page of Google's search results (and knowing in advance that the cluster you're looking for actually starts on the 20th page).

Once you have an idea of what each bubble represents (from the mouse-overs), double clicking on one of them could take you to the search result. Or, imagine if you encircled a cluster with a selection tool (like what comes with graphics programs), double-clicked on that, yielding some Google or Yahoo-like search results with headlines and blurbs for just that cluster (blurbs based on the discoveries of the linguistics technology -- not just simple text hits).  Sun didn't show that particular feature. But it would be child's play for the person who coded this 3D sphere to come up with that.

Two paragraphs ago, my mention of the phrase "CD cover" was a deliberate plant. Scouring text and applying some lingustics technology almost sounds par for the course in terms of what search can do now and where it should be heading. The 3D sphere also seems like a natural evolutionary step given where we've seen user interfaces in today's operating systems going. But text isn't the only content Sun can probe in a unique way.  It thinks it has figured how to probe music as well. In other words, instead of studying the linguistic quality of the content, it studies its audio properties. The net result is that the clusters shown in a sphere are clusters of similar music rather than similar bodies of text. And not just a cluster of classical music here, blues there (rock would be positioned relatively close to blues) and jazz on the other side. Within the general classical cluster might be the various genres of classical music such as classical guitar music (this was demonstrated). 

Searching wasn't all that the technology was capable of. With audio as the context, we saw a demonstration of how, in connect-the-dot-fashion, the small bubbles could be strung together to form a playlist.  The technology shown was even capable of automatically plotting a path (a playlist) from a quiet song to one with more energy if say, you wanted a playlist for your workout that started off mellow and worked up to something with more energy.

Relevance, "surface," and overall user experience. After seeing the demonstrations of what Sun has in the works in its Labs, my next visit to a search engine was, well, dull and not terribly productive. Perhaps there's hope and room for improvement in search, after all.

Disclosure: In the spirit of media transparency, I want to disclose that in addition to my day job at ZDNet, I’m also a co-organizer of Mashup Camp, Mashup University, and Startup Camp. Yahoo, Google, and Sun, all of which are mentioned in this story, were sponsors of one or more of those events. For more information on my involvement with these and other events, see the special disclosure page that I’ve prepared and published here on ZDNet.

Editorial standards