Mike Lynch, the CEO of Autonomy, the UK enterprise search company, was in town this week (video is coming). I used to meet with Mr Lynch regularly when I was at the Financial Times, and when the dotcom boom was in full swing.
The dotbomb affected Autonomy along with tens of thousands of other companies, and Mr Lynch's visibility suffered too. So it was good to catch up with Mr Lynch and we joked that the "bubble" was back (it's not, of course).
We chatted about some of the trends in Autonomy's space, which is how to deal with unstructured data--estimated at about 87 per cent of all data.
Autonomy's technology uses statistics to find correlations between data, documents, video, and audio. It doesn't need artificial intelligence to understand the connection between data--it understands the probability of that data being related.
Autonomy uses Bayesian probability developed by Thomas Bayes, an 18th century British Presbyterian minister. The advantage of this approach is that Autonomy can find stuff without knowing any keywords or tags or taxonomy--it can determine the taxonomy on the fly.
During a presentation, Mr Lynch slammed the popular practice of tagging web content and says that it won't help to organize information. Mr Lynch quoted an essay by Cory Doctorow, the science fiction writer, titled Metacrap. "Tags don't work because people lie, they are lazy, and they use different tags. And there is a huge amount of information that will never be tagged."
I agree, and I resent the work in having to tag everything (See: Is search broken?) But, tagging does work to some extent, and in some applications.
Autonomy's technology could be used to improve "tagging." Often, it is not clear what tags to apply--technology such as Autonomy's could help identify the appropriate tags. For example, are the Technorati tags at the end of this post the right ones to use to associate this post with other, similar posts?
Earlier this week, Autonomy introduced a new product called Virage ACID (Automatic Copyright Infringement Detection) which uses its technology to search through video images. It can automatically detect copyrighted videos.
Being totally independent of media format, ACID can not only detect whether distributed video infringes copyright, but also whether audio content ripped from a copyrighted video or audio track that is overlaid on legitimate video has been uploaded.
Seems like it was designed for YouTube...