Autonomy CEO says tags don't work

Autonomy CEO says tags don't work

Summary: Mike Lynch, the CEO of Autonomy, the UK enterprise search company, was in town this week (video is coming). I used to meet with Mr Lynch regularly when I was at the Financial Times, and when the dotcom boom was in full swing.

TOPICS: Autonomy

Mike Lynch, the CEO of Autonomy, the UK enterprise search company, was in town this week (video is coming). I used to meet with Mr Lynch regularly when I was at the Financial Times, and when the dotcom boom was in full swing.

The dotbomb affected Autonomy along with tens of thousands of other companies, and Mr Lynch's visibility suffered too. So it was good to catch up with Mr Lynch and we joked that the "bubble" was back (it's not, of course).

We chatted about some of the trends in Autonomy's space, which is how to deal with unstructured data--estimated at about 87 per cent of all data.

Autonomy's technology uses statistics to find correlations between data, documents, video, and audio. It doesn't need artificial intelligence to understand the connection between data--it understands the probability of that data being related.

Autonomy uses Bayesian probability developed by Thomas Bayes, an 18th century British Presbyterian minister. The advantage of this approach is that Autonomy can find stuff without knowing any keywords or tags or taxonomy--it can determine the taxonomy on the fly.

During a presentation, Mr Lynch slammed the popular practice of tagging web content and says that it won't help to organize information. Mr Lynch quoted an essay by Cory Doctorow, the science fiction writer, titled Metacrap. "Tags don't work because people lie, they are lazy, and they use different tags. And there is a huge amount of information that will never be tagged."

I agree, and I resent the work in having to tag everything (See: Is search broken?) But, tagging does work to some extent, and in some applications.

Autonomy's technology could be used to improve "tagging." Often, it is not clear what tags to apply--technology such as Autonomy's could help identify the appropriate tags. For example, are the Technorati tags at the end of this post the right ones to use to associate this post with other, similar posts?


Earlier this week, Autonomy introduced a new product called Virage ACID (Automatic Copyright Infringement Detection) which uses its technology to search through video images. It can automatically detect copyrighted videos.

 Being totally independent of media format, ACID can not only detect whether distributed video infringes copyright, but also whether audio content ripped from a copyrighted video or audio track that is overlaid on legitimate video has been uploaded.

Seems like it was designed for YouTube...


» Is search broken Tom Foremski IMHO





Topic: Autonomy

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Tagging that worked.

    The best tagging that I've seen and used worked because people didn't think of it as tagging.

    Usenet newsgroups.

    People think of a Usenet newsgroup as a forum, something that acts like a place, but they're not, really... they're just an agreed-upon tag. Because it's agreed upon, it works pretty well. Not perfectly, but there are tools to filter out or block people who abuse the group name system.

    Where it falls down is scalability. Once it gets big enough, there's too many people who have no interest in it except what they can get out of it. All the problems you mentioned like people lying (spamming) or not bothering to tag (inappropriate crossposts, not changing crossposts when the subject changes, and so on) are there.

    But for all that, it still works better than random make-it-up-as-you-go-along tagging.
    • Agreed-upon tagging

      Open Group has the universal data element framework initiative underway -
  • what a load of crap!

    I've met Lynch... I've heard his "over 86%" pitch... heck, our shop even uses Autonomy, and it is garbage! The software fails inexplicably, and you need to buy a quad xeon machine w/ 16G of ram to run it effectively... and that's for around 100G of data... very little in the petabyte+ world of enterprise computing!

    Your article doesn't describe Autonomy or Bayesian inference well enough to do either justic, but here goes nothing: the premise of Autonomy is that it infers relationships among words with regards to their placement within proximity to each other. By removing certain common words such as "the, as, a, for, etc." it discerns the probability-- and this is just it, it is only probability.

    Problem is, the more data you feed it, the more words will intersect with others. Therefore, as the occurrence of words increases, Autonomy's effectiveness decreases.

    Autonomy and every other search provider are bogus. But still, the technology is "better than nothing", which is the only alternative presented thus far.

    HOWEVER!! And I stress this "however"... I STRONGLY disagree with Mr. Lynch's assertion that tagging doesn't work. In fact, I believe with practice and time, user collaboration will enhance the visibility of key information, and while tagging in its current form may not be the way to go, some rendition of user collaborative searching will ultimately dominate all search engines, and finally give us good results every time, not some of the time, or not at all.