Microsoft has released Concept Graph, a database of words linked to millions of concepts that it's using to help machines grasp meanings the way humans can when interpreting a sentence.
As Microsoft notes, one thing that separates humans from machines when it comes to understanding a sentence is that humans have knowledge about concepts, such as dates, people, and animals.
Humans also have the ability to conceptualize things, such as "cats are animals", or that a birthday is a significant date for a person.
Machines don't have these qualities and that's why a computer could take the phrase "animals other than dogs such as cats" to mean that "cats are animals" or that "cats are dogs". Humans, who have learned that cats aren't dogs, would find the second interpretation improbable.
Concept Graph is the tool that Microsoft thinks can give machines human-like abilities to parse such sentences without becoming confused by ambiguities in everyday language. Or as Microsoft puts it, Concept Graph aims to give machines "common-sense computing capabilities" and an awareness of a human's mental world, which is underpinned by concepts that it's mapped to text entities.
The Concept Graph release opens up Microsoft's Probase graph of concepts, which it has been developing since 2010 and imbuing with "knowledge" captured from billions of webpages and several years of human web searches.
In 2012 it had 2.7 million concepts backed by 1.68 billion webpages. Today, Concept Graph has 5.4 million concepts, which Microsoft boasts is more plentiful than other public knowledge bases, such as ResearchCyc's Cyc database of 120,000 concepts, and Google's deprecated Freebase.
Google of course has its Knowledge Graph, which has grown from 18 billion facts and connections covering 570 million entities in 2012 to 70 billion facts, helping power search suggestions and machine translation.
Behind each concept in Microsoft's Concept Graph also sit sub-concepts, a set of attributes, relationships, such as the connection between "apple" and "Newton".
Microsoft is releasing the Concept Graph to developers in three phases, introducing Microsoft's Concept Tagging or conceptualization model to help machines understand human communication.
The tool maps words with semantic concepts with probabilities that may depend on context. It sees potential for the toolset to aid search, automatic questioning and answering, online advertising, recommendation systems, and AI systems.
"Conceptualization maps instances or short texts into a large auto-learned concept space, which is a vector space, with human-level concept reasoning. It can be treated as both human-understandable and machine-understandable text embedding. Thus it provides us the capability of text-concept tagging, short-text semantic similarity computation for text understanding," Microsoft explains.
The first release can handle a single instance, such as a term like Microsoft, with links to concepts such as company, software company, and OS vendor. The second release will add context and a third will add short text conceptualization.