Knowledge graph evolution: Platforms that speak your language
Knowledge graphs are among the most important technologies for the 2020s. Here is how they are evolving, with vendors and standard bodies listening, and platforms becoming fluent in many query languages
This may come as a shock if you've first encountered knowledge graphs in Gartner's hype cycles and trends, or in the extensive coverage they are getting lately. But here it is: Knowledge graph technology is about 20 years old. This, however, does not mean it's stagnating -- on the contrary.
First, let's quickly recap those 20 years of history. What we call Knowledge Graphs today has been largely initiated by none other than Tim Berners-Lee in 2001. Berners-Lee, who is also credited as the inventor of the web, published his Semantic Web manifesto in the Scientific American in 2001. The core concepts for Knowledge Graphs have been laid there.
The Semantic Web manifesto was in many ways ahead of its time. Looking back today, we can see some parts of it going strong, while others have faded. Building on a foundation of standards for interoperability, such as Unicode, URIs, and RDF, the core of the vision has always been semantics: instilling meaning in web content.
The Semantic Web got a bad name for being academic, while some technical choices such as XML did not quite work out. The thing is, however, that crawling and categorizing content on the web is a very hard problem to solve without semantics and metadata. This is why Google adopted the technology in 2010, by acquiring MetaWeb.
In 2012, the term Knowledge Graph was introduced. A very successful rebranding indeed, and that's not all we have Google to thank for. Google employs key people in the domain and is the driving force behind schema.org. Schema.org is the core of Google's knowledge graph. It is, unsurprisingly, a schema.
Knowledge graphs and schemas are foundationally bound. While not all knowledge graphs are as big as Google's, every one of them is based on a schema. Knowledge graph neophytes do not always realize this, but whether it's implicit or explicit, there's always a schema. Which brings us to the point.
Knowledge graphs and graph databases
Knowledge graphs can be stored in any back end, from files to relational databases or document stores. But since they are, well, graphs, it does make sense to store them in a graph database. This greatly facilitates storage and retrieval, as graph databases offer specialized structures, APIs, and query languages tailored for graphs.
In addition, many graph databases today offer a lot more than just a store for data. They come packaged with algorithms for graph analytics, visualization capabilities, machine learning features, and development environments. They have essentially grown from databases to platforms. But there is further nuance here.
Graph databases come in two main flavors, depending on which graph model they support: Property graph and RDF. In general, RDF graph databases emphasize semantics and interoperability, while property graph databases emphasize ease of use and performance.
When it comes to knowledge graphs, RDF graph databases are a natural match. It's not impossible to build knowledge graphs on top of property graph databases. Usually, however, this results in having to learn knowledge management fundamentals the hard way, and re-implement relevant features. While lessons don't come for free, building on platforms centered around knowledge management helps.
A key element to bridge the gap is something called RDF* (RDF star). RDF* is a proposal to standardize a modeling construct for RDF graphs, namely the addition of properties to edges. Although this is possible in RDF, there is no standard way of doing it. Standardizing it would not only help interoperability with property graphs but also interoperability among RDF graphs.
From secret handshakes to RDF stars
As Steve Sarsfield, VP of Product in Cambridge Semantics put it, before RDF*, if people wanted to use edge properties in RDF graphs, they had to rely on secret handshakes. This is not ideal, especially considering one of the key advantages of the RDF stack is standardization and interoperability.
In the wake of the W3C initiative, a couple of RDF graph database vendors went ahead and implemented RDF*. Cambridge Semantics is one of them. Its AnzoGraph database supports RDF*, as well as SPARQL*. SPARQL is the standard query language for RDF, and SPARQL* is its extension that works with RDF*.
Cambridge Semantics recently unveiled AnzoGraph DB Version 2, and when discussing the release with Sarsfield, we wondered what their experience from the field has been. Are people asking for RDF*, has it helped adoption? Bridging the gap with property graphs has enabled AnzoGraph to get an implementation of Cypher, the most popular language for querying property graphs, underway.
Sarsfield noted that it's still relatively early days for knowledge graph adoption. As such, many of the organizations that use AnzoGraph tend to have highly skilled people on board. For them, switching between data models and query languages is not much of an issue. For mainstream adoption, however, this is important.
Stardog is another RDF graph database vendor that has implemented RDF*. Mike Grove, Stardog co-founder and VP Engineering, said this has been in the works for a while, and they are very excited about it. Stardog started working on the plumbing as part of the Stardog 7 development effort, and they were very happy to be able to ship the feature.
Regarding its reception, Grove noted that what people wanted was a more user-friendly way to have edge properties: "Neo4j obviously got this right. RDF* does a fantastic job of bringing the same ease of use to semantic graphs." He went on to add that customers are excited, and many are already working on integrating it into their applications.
Technically, RDF* and SPARQL* are not yet standardized. Both have been introduced by Olaf Hartig, a researcher at Linköping University. When inquiring about their status, Hartig noted that while there have been delays, he hopes the standardization process will pick up speed soon.
For knowledge graph platforms, too, GraphQL is a plus
Both Sarsfield and Grove noted that they expect RDF* to boost knowledge graph adoption. Implementation is key, and having early adopters and real-world usage may also catalyze the standardization process. Sarsfield and Grove expressed their support for the process, as well as the need to get the word out.
RDF* can make a difference, but it's not the only thing going on in the knowledge graph world. As knowledge graphs entail several layers and can be a central piece of infrastructure for organizations, graph databases are growing into platforms.
AnzoGraph started as part of the Anzo platform before becoming a product in its own right. Stardog also touts its product as a platform, emphasizing features such as visualization and virtualization built around the graph database core.
Another RDF graph database vendor, Ontotext, recently announced a new version of its own platform. An interesting feature that Stardog's and Ontotext's platforms share is support for GraphQL. Unfortunately, GraphQL's name does not do it justice. As if there was not enough confusion already regarding graph: GraphQL is not a graph query language.
As Stardog put it, more developers know and are learning GraphQL than all the graph query languages combined. Ontotext on its part put together a rather elaborate post on the use of GraphQL in its platform. Whichever way you approach it, however, GraphQL makes lots of sense for accessing services built around database platforms.
GraphQL plus variants
Stardog reports GraphQL success within its customer base. Grove mentioned that one of the big Silicon Valley tech companies exclusively uses GraphQL to interact with Stardog. Both Grove and Jem Rayfield, Ontotext's Chief Architect, agree that GraphQL can work well in some cases, but by its very design, the expressiveness of GraphQL is quite limited.
URIs are global identifiers, which can denote concepts from shared vocabularies, such as schema.org or other ontologies. This seems like a natural fit, and one that both Grove and Rayfield agree has potential. There is another working group set up to align RDF and GraphQL, although it does not look like it's moving very fast.
Knowledge graphs in the 2020s: We speak your language
It seems we are moving towards a new status quo. If NoSQL stands for Not Only SQL, we could call this NoSPARQL -- Not Only SPARQL. SPARQL remains the language of choice for taking full advantage of knowledge graph capabilities. It also doubles as an API, its expressiveness is beyond what GraphQL can attain, and SPARQL's federated query and data integration capabilities are unique.
But vendors seem set to meet users where they are, be it GraphQL or any other language. Even SQL. As Stardog's Grove put it: "We've always strived to bring our technology to the users. GraphQL was a step in that plan. Supporting SQL is the next step in that journey, not because SQL is better than GraphQL, but because of what that support enables."
Eventually, even natural language support could be an option. "No matter how you feel about SQL, SPARQL, GraphQL, or any other query syntax/language, natural language is just better. Why ask someone to learn an esoteric syntax when they can just simply type?" said Grove.
We don't know whether conversational knowledge graphs are something everyone would be comfortable with. What we do know is that more options is a good thing, and exciting times are ahead. Stay tuned as we keep exploring the years of the graph.