The Linked Content Economy: Thomson Reuters Open Calais Toolkit to Create More Intelligent Applications

The Linked Content Economy: Thomson Reuters Open Calais Toolkit to Create More Intelligent Applications

Summary: Thomson Reuters today announce Calais 4.0, 'a web service that uses natural language processing technology to semantically tag text that is input to the service.


Thomson Reuters today announce Calais 4.0, 'a web service that uses natural language processing technology to semantically tag text that is input to the service. The tags are delivered to the user who can then incorporate them into other applications - for search, news aggregation, blogs, catalogs, you name it.'

My ZD Net colleague Paul Miller has covered the fundamentals of today's release here and here, with his vastly superior semantic knowledge to mine, and the video above provides a basic overview of the Calais proposition to users.

I talked with Tom Tague of Calais last month about progress and this upcoming release, and Thomson Reuters desire to be the semantic plumbing for the planet. As the largest business data analyst in the world, rich semantic contextual search of their information gold mine adds value and is a driving force to further monetize their content. (The Flash intro on the main Thomson Reuters site gives an idea of the depth of content they offer)

So what's in it for you as a potential implementer? The Open Calais plumbing aims to provide rich contextual linking of information, which ultimately makes search results much more relevant and valuable.

In a utopian world, your multiple shared drives or database silos would all be connected in a single giant graph data structure of interconnected nodes.

Adopting the free Open Calais api allows you to federate all your content; all your information would be available in an interlinked data cloud.

In 2007 Thomson Reuters bought text analytics company ClearForest, a provider of text driven business intelligence solutions which supplies a bridge between unstructured text and your enterprise data.

This is now the underpinning of Open Calais; if an enterprise wants to have their entire information cloud behind their firewall, ClearForest could be a foundation.

The enterprise goal would be to manageably commingle private and public data within acceptable security boundaries. This could include streaming in for example financial information, or a pharmaceutical taxonomy from a Freebase or similar, and mashing it up with your private data and information.

Long term content interoperability is a Calais strategic objective, and they are encouraging wide scale adoption for Publishers, Bloggers, Software Providers, Content Managers and Developers with lots of hand holding on their new Drupal and Open Calais powered website.

From a business perspective, Thomson Reuters clearly gets the open business model, moving away from a walled garden of content to a metered on demand content assets delivery model. Instead of you visiting them for information they will deliver what you need into your environment...

As a major player in the media world this is a hugely encouraging development by Thomson Reuters in launching this latest industrial strength technology release to a rapidly increasing developer base.

The connected data ecosphere just got a lot richer and it will be fascinating to see how the other key players in the space coalesce to create what I believe will transform both the search and collaboration worlds.

Topics: CXO, Data Centers, Data Management, Enterprise Software, Software, IT Employment


Oliver Marks & Associates provides seasoned, technology agnostic independent consulting guidance to companies on effective Digital Enterprise Transformation business strategy, tactics, infrastructure & technology decisions, roll out and enduring use models and management.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • The index: is The Semantic Plumbing for the Planet.

    As posted on January 14th, 2009 by Oliver Marks @ 9:51 am

    The limitation is the silo-in, silo-out of this service. Capturing and storing each user's closed network use of the service's "named entity" structure" fails to semantically aggregating and integrating existing and emerging classes of source content across all users of this service.

    How does this "named entity tagging" enable enterprise user's competitiveness? Each user still lacks its meaning related to their own global competitive landscape. Each is at risk when not contextually aware of minute-to-minute feeds of global events and data and their ?contextual inference? on enterprise tactics and strategy.

    Each existing and emerging classes of source content has been uniquely expressed in some digital format and each continues to evolve. The grass roots innovators are not linked to any standards, but push what ever seems to be working to achieve their unique desired result. No vendor library of syntactic format-recognition code can ever scale in this creatively exploding semantic digital-object format environment. Each grass roots innovator has its own unique interpretive knowledge-base; they alone use, (Ontology) to bridge their chasm between the rich message contained and their achievement of a useful digital representation of their (semantic) message.

    However - Ontologies stay-put, they are uniquely local and should not integrate or aggregate.

    Being in command of actionable, globally interpreted, contextually relevant information is the next generation of competitive success. Ontology is the contextual-meaning-aware, interpretive community of practice?s tool enabling this next generation.

    Ontology enables ?n dimensional tagging? to populate ?n dimensional graphical representations of the meaning intended to be disclosed by the author or creator of each informational object?s content.

    The nature of a user attaining a personal competitive advantage is: being "semantically aware" of all timely-available contextual information important to that individual user - such an abstract view is "structure-less;" never within the network schema itself ? but a powerful, graph-based, semantically and context aware index engine.

    Practitioners within the field of Ontology can be found along a spectrum of constructs, from fuzzy-matching to ridged modeling. Structure is the core of the model-driven "semantic interoperability" community (think EDI "ridged" electronic data interchange). Jarg? work takes the "Fuzzy approach:" as the active thesaurus enriching the meaning of what "all human senses perceive" within a unique community of practice.

    Through our lens, Ontologies stay-put, they do not integrate or aggregate. Ontology in Jarg's fuzzy world is always a dynamic local knowledge-base, often with other structures within, such as DB schemas, non-textual object content and media content's metadata. It's a false-hood, in my view, to approach aggregating and integrating existing ontologies. Their sole purpose is the continually enrich the graphical representation (abstraction) of the meaningful teaching within all that community?s knowledge objects.

    Ontology is truly the localized interpretive, "current understanding tool" used to parse-out (and graph) the detailed (fragments of) contextual meaning within the data, knowledge, objects and media held dear by that community of practice. Ontology is groomed and nourished at the grass roots, encompassing the active thesaurus of "all the human senses" perceived by each community member and is invalid beyond that community. Their meaningful graph fragments co-exist within the powerful semantic distributed index with all other communities of practice fragments. That is the magic, enabling the next generation of search and intelligent agents to serve the need for competitive advantage of each economic and social individual and organization.

    So, I hope that you have an image of Ontology - as a localized reference of meaning; an interpretive reference-base and not the actual data, knowledge, objects and media itself. Think of the interpretive brain (as ontology) of each county ambassador during a United Nations general assembly debate. Each understands an abstract (graph) of the contextual meaning of only their country's assets and multi-facetted objectives; and, must interpret those (graph fragments), when responding to any (graph fragments) of another ambassador's question.

    Being "semantically aware" of all timely-available contextual information important to you means that you can not expect to command network bandwidth to directly query each individual global data source or listen in on all channels of live media feeds. The nature of your being "semantically aware" of all available contextually relevant information, important to you, requires a "structure-less," abstracted, contextually aware view over all accessible information sources.

    The solution is a graph-fragment index, which is your ambassador; the index's graph engine understands your needs and can effectively go to all the other ambassadors in the distributed index to discover exactly what is available within all (graphed) information sources, to fulfill your detailed requests. Then re-order all according to most contextually relevant for you.

    Enterprise competiveness is at risk when not contextually aware of minute-to-minute feeds of global events and data; and, there ?contextual inference? on enterprise tactics and strategy; as well as an individual?s professional career prospects.

    It's being in command of a "context aware fragment cloud (Jarg's index) - where your persistent, index-resident queries (agents ? routing search), are dynamically serving-up your personal information needs in real-time. The competitive advantage is only realized when you become "always contextually situation-aware."

    Ontology is local, Jarg's context aware fragment index engine is global; together they enable situation-aware, competitive advantage. "The Semantic Enterprise Advantage"
  • RE: The Linked Content Economy: Thomson Reuters Open Calais Toolkit to Create More Intelligent Applications

    Just a quick update. Today we?ve increased the daily transaction allowance for OpenCalais to 50,000 transactions per day ? a 25% increase over our previous daily limit.

    Of course, OpenCalais continues to be offered at no charge for commercial or non-commercial use.

    The OpenCalais team