The printing press was a pretty pivotal invention, challenging artificial limitations on the dissemination of ideas maintained by scriptoria and opening flood gates to the vibrant philosophical, social and technological innovations of the Renaissance, Reformation, and beyond.
From those early innovations, we entered a long period in which more and more of the advances in thought and practice were reported via papers printed in scholarly or professional journals; journals that were, for most people, too specialised and expensive to be accessed anywhere other than in a library. The number of journals grew, and various tools emerged to help us find the papers we needed. These tools tended to divide along subject or publisher lines, forcing the searcher to have prior knowledge of those journals (and their publisher) most likely to be of use. Various attempts were made to offer solutions capable of searching across more than one of these databases, but these were usually hampered by an unwillingness from the publishers to share sufficient data to drive any really useful searches. We only need to glance at the rather daunting lists of resources maintained by a University Library to see how far from ideal the current model is, with its emphasis upon the container (the journal) rather than the content (the article).
Having spent much of my own time at University finding excuses to avoid dealing with the wilfully (well, so it seemed!) obtuse way in which e-resources were carved up, I was of course interested when offered the opportunity to learn more about a 'better way.'
Rafael Sidi, VP Product Development in the engineering and technology division at scientific publisher Elsevier, and Jens Tellefsen, VP Marketing & Product Strategy at semantic indexing company NetBase spent some time on the phone, introducing me to their new joint venture; illumin8.
Their press release from last month (.doc file) describes the proposition;
"illumin8 combines search and semantic indexing technologies to distill deep meaning, purpose and insight from the vast amount of Elsevier’s full-text content, scientific abstracts from 4,000 publishers, patents and billions of web pages. This research tool extracts and analyzes solutions, which are then categorized under organizations, products, technologies, approaches, and experts. Illumin8 is designed to go beyond simple keyword search, quickly finding and extracting crisp summarized answers and interrelationships that are semantically related to the context of the search query.
In addition to finding solutions from 5 billion web pages, millions of patents and Elsevier’s premium scientific content, illumin8 users will be able to easily access the full-text of Elsevier journals if they have an online subscription to the journal through ScienceDirect, the world’s largest online platform of science, technical and medical (STM) content."
This is a useful example of matching a huge body of content with some interesting technology, and making the value locked up inside that content more visible to subscribing customers as a result.
Quoting from the press release again,
"Booz Allen estimates that the Global Innovation 1000 companies spent $447 billion on R&D initiatives in 2006."
That's an awful lot of investment, paying for an awful lot of effort... so even incremental improvements to the manner in which those researchers locate information to support their decision making are of great value.
I'd need to really use it for a while to be sure, of course, but NetBase's technology does appear to work well with the information Elsevier has on their own resources, improving both precision and recall in pulling back results that align well with the searches we tried.
Fundamentally, though, this is a market crying out for more cooperation. illumin8 does have access to brief records of some 33 million items in 15,000 journals published by 4,000 publishers other than Elsevier, but it works best with more information; information that it (currently) only has access to for Elsevier's own journal and patent content. I did ask Rafael about this, and he said that they were open to approaches from other publishers who might wish to see their content appear more richly inside illumin8. It will be interesting to see the extent to which these competitors of Elsevier are happy to consider putting 'their' content inside a competitor's product.
As value propositions continue to evolve, the possibility of being found and used increasingly outweighs the notional revenue to be lost by lowering barriers to the (currently revenue generating) metadata about content; metadata that people use solely to find and evaluate the more valuable content in the first place.
Can content owners such as the large academic publishers evolve, and does technology such as this offer ways to manage and navigate the data once it's available?