X
Tech

Tim Berners-Lee talks cranberry sauce and Linked Data in New York City

Sir Tim Berners-Lee took to the stage in New York City last night, to deliver the final keynote of the day at JupiterMedia's new semantic web event, Linked Data Planet.The ballroom of the Roosevelt Hotel was certainly busier than earlier in the day, as a smattering of Press and members of the New York Semantic Web Meetup joined delegates at the two day conference to hear Berners-Lee speak.
Written by Paul Miller, Contributor

Sir Tim Berners-Lee took to the stage in New York City last night, to deliver the final keynote of the day at JupiterMedia's new semantic web event, Linked Data Planet.

The ballroom of the Roosevelt Hotel was certainly busier than earlier in the day, as a smattering of Press and members of the New York Semantic Web Meetup joined delegates at the two day conference to hear Berners-Lee speak.

In many ways, his essential argument (unsurprisingly) was a reprise of themes explored in conversation with me earlier this year, expanded upon in Beijing in April and at the recent Rensselaer debate.

In a clarification of language, Berners-Lee stressed the importance distinction between 'Linked Data' and 'Linked Open Data,' noting that the terms can - incorrectly and confusingly - be used interchangeably.

Linked Data, he contested, is data made visible using open standards in a way that conforms to his earlier Principles document;

  • "Use URIs as names for things
  • Use HTTP URIs so that people can look up those names
  • When someone looks up a URI, provide useful information
  • Include links to other URIs. so that they can discover more things."
  • In essence, he argues that data should be made available in such a way that individual items (a person in an HR system, a printer cartridge in a stock control system, etc) has a 'name' that has meaning, and that can be retrieved. If I am employee 123, then something like http://my-company.com/hr-system/employees/123 might pull up my records for anyone with permission to see them.

    As Berners-Lee has remarked on more than one occasion recently, Linked Data is 'the web done right.' He said it again last night, and clearly believes passionately that we need to move beyond the current paradigm in which off-web databases are only grudgingly exposed to third parties via limited Web interfaces. The data in those databases could be part of the Web too - for those with permission to see it, of course - rather than remaining locked up inside a separate silo.

    Linked Open Data goes a step further, inheriting all the attributes of Linked Data and adding requirements that the data be 'open' and therefore available for full and free use and reuse by third parties. Recent work (disclosure: commissioned and funded by my employer, Talis) on the Open Data Commons offers one licensing regime under which this requirement to be 'open' can be met.

    We then turned to an image of the label from a jar of cranberry sauce. In an analogy that worked well with the audience, Berners-Lee illustrated the way in which different elements of the label (brand, product name, dietary information, allergy warnings, printers' mark, etc) applied to various geographies in different ways, had different legislative implications, and required very different regulatory processes to check assertions and reach consensus on the metrics involved. The 'brand' may be global, for example, and entirely within the purview of the brand owner. The product name may vary from country to country, but is essentially controlled by the brand owner too... albeit with the inevitable input from consumer focus groups. Whilst the 'facts' of the dietary information may be universal, local regulations from bodies such as the US Food and Drug Administration will require different attributes of those facts to be expressed, and may require the same information to be expressed in quite different ways depending upon local practice. Finally the (usually hidden) printing information at the edge of the label may well be globally applicable, but defined by a very different set of processes governed by equipment manufacturers, the international standards bodies, etc.

    That single label is a container for several different pieces of information, all governed by various processes and regulations that operate independently of one another. There is no need for the producer of this cranberry sauce to get regulators, printers, focus groups, marketing professionals, ink makers, and healthcare workers in a room to agree on the whole label. Equally, there is no need to involve any of the other parties when regulations, policy, or the brand manager require changes to one piece of the whole.

    Berners-Lee argued that the RDF specification lying behind the Semantic Web works in a very similar way, enabling data from different systems to be drawn together and expressed within the wrapper of a single RDF document in a way very different to the top-down and robustly specified syntaxes used elsewhere in the IT world. The complexity and inflexibility of some enterprise XML schemas, he argued, was like a world in which every contributor to that cranberry sauce label would have to agree to the content and form of every other element on the label. The flexibility of RDF cetainly seems compelling, when painted in that light!

    Moving on, Berners-Lee challenged the notion that 'the market for data' is most important anymore. He argued that there is far more opportunity in 'markets enabled by data;' markets that almost inevitably require the underlying raw data itself to be easily available and commoditised. This argument is not new, and it is one that at least some traditional data holders are beginning to grasp. Thomson Reuters, for example, is making huge strides forward with their Open Calais initiative, which has been discussed here before. Barak Pridor from Thomson Reuters also spoke at the conference, and I'll cover his presentation shortly. The argument may not be new, but there's a long way to go in persuading everyone involved in the data business. Tim's growing interest in this aspect of the story will hopefully be a powerful weapon moving forward.

    "When there's lots of linkable data out there, there will be huge competitive advantage in a good user interface."

    By adopting the same underlying specifications (RDF etc) and accepting Tim's exhortations to name things with URI's, data owners stand to unlock a lot of the value that is currently tied up in their expensive back-end silos. How far, though, are our media companies, hardware manufacturers, and financial institutions prepared to go? Berners-Lee presents a bold yet achievable vision. Was New York listening, and how will it respond? How will such a response differ from the one we might expect in the more risk-prone Valley, or the Semantic Web research powerhouses of Europe?

    Editorial standards