After more than a year of stagnation, the 'Linked Open Data cloud diagram' found in so many presentations and blog posts is back. It's bigger, it's better, and it points to the continued growth of this disparate community.
Working with Anja Jentzsch of Freie Universität Berlin, Richard Cyganiak of Galway's Digital Enterprise Research Institute (DERI) has returned to the diagram he first drew in 2007, and brought it right up to date. 203 data sets are represented, and together they comprise around 395 million links between over 25 billion RDF statements. Not bad for something that began life as a small academic exercise, but there's a very long way still to go.
The goal of the W3C SWEO Linking Open Data community project is to extend the Web with a data commons by publishing various open data sets as RDF on the Web and by setting RDF links between data items from different data sources.
From those early days, when researchers 'found' data sitting on the web and set about transforming it into RDF by themselves, the initiative has grown significantly. Governments, media properties from the BBC to the New York Times, and retailers such as Best Buy have been amongst those to set about plugging their own data into the increasingly rich network.
Links between these data sets are established when one draws upon concepts defined in another. GeoNames, for example, is seen by many as a useful resource to draw upon in describing places. Rather than fuzzily talking about 'Paris,' I might make the statement unambiguous by referring to this Paris rather than that one. For a dataset (describing works of art, for example) in which location is not key, it's far easier to let someone else worry about recording where Paris is, what country it's in, how big it is etc rather than do it all myself.
This concept of linking from one resource to another is clearly key, but the relative dearth of connections on the diagram shows that there is still some way to go. DBpedia remains disproportionately important, and whilst it's unlikely that every resource will ever meaningfully connect to every other resource, some more connections would be valuable.
The latest iteration of the cloud diagram is based upon information that the community was invited to record in the CKAN data repository, which should make data collection and update easier and more accurate. In principle, future versions of the diagram could be generated programmatically as the data changes.
This month's revision of the Linked Data cloud is a useful illustration of progress within the community. Much remains to be done, both in terms of clarifying licenses and by collecting even more data. There is also work to do in adequately explaining what people might find. The diagram does link through to descriptions of each resource, but the 'explanations' are all-too-often only meaningful to people knowledgeable enough not to need an explanation in the first place!
This repository contains data from JISC, who fund research and infrastructure in the UK.
What data? About what sort of thing?
Airport data from Our Airports published as RDF
What sort of data? Where in the world?