Open data is one of those refreshing trends that flows in the opposite direction of the culture of fear that has developed around data security. Instead of putting data under lock and key, surrounded by firewalls and sandboxes, some organizations see value in making data available to all comers -- especially developers.
The GovLab.org, a nonprofit advocacy group, published an overview of the benefits governments and organizations are realizing from open data, as well as some of the challenges. The group defines open data as "publicly available data that can be universally and readily accessed, used and redistributed free of charge. It is structured for usability and computability."
Dr. Kirk Borne published a list of some of the leading open data repositories on the scene today, published an exhaustive list of public data providers, which includes a gazillion sources of government data for use by developers, researcher, and all other interested parties.
For enterprises and developers, one of the most exciting aspects of the open data movement is the APIs that are increasingly emerging to provide connections to various, and often long-obscured data resources. As Borne explains it, open data repositories provide a lot of advantages -- such as providing transparency, and serving as the fuel for innovation and transformation. Open data sets also "allow many more eyes to look at the data and thereby to see things that might have been missed by the creators and original users of the data."
Consider the three different APIs offered by the World Bank, for example, which provide details on World Bank projects and finances, or the US government's APIs for accessing everything from weather data to population distruibition,
For enterprises, an open-data stance may be the fuel to build a vibrant ecosystem of developers and business partners. Scott Feinberg, API architect for The New York Times, is one of the people helping to lead the charge to open-data ecosystems. In a recent CXOTalk interview with ZDNet colleague Michael Krigsman, he explains how through the NYT APIs program, developers can sign up for access to 165 years worth of content.
But it requires a lot more than simply throwing some APIs out into the market. Establishing such a comprehensive effort across APIs requires a change in mindset that many organizations may not be ready for, Feinberg cautions. "You can't be stingy," he says. "You have to just give it out. When we launched our developer portal there's a lot of questions like, are people going to be stealing our data, questions like that. Just give it away. You don't have to give it all but don't be stingy, and you will find that first off not that many people are going to use it at first. you're going to find that out, but the people who do, you're going to find those passionate people who are really interested in using your data in new ways."
Feinberg clarifies that the NYT's APIs are not giving out articles for free. Rather, he explains, "we give is everything but article content. You can search for articles. You can find out what's trending. You can almost do anything you want with our data through our APIs with the exception of actually reading all of the content. It's really about giving people the opportunity to really interact with your content in ways that you've never thought of, and empowering your community to figure out what they want. You know while we don't give our actual article text away, we give pretty much everything else and people build a lot of really cool stuff on top of that."
Open data sets, of course, have to worthy of the APIs that offer them. In his post, Borne outlines the seven qualities open data needs to have to be of value to developers and consumers. (Yes, they're also "Vs" like big data.)
- Validity: It's "critical to pay attention to these data validity concerns when your organization's data are exposed to scrutiny and inspection by others," Borne states.
- Value: The data needs to be the font of new ideas, new businesses, and innovations.
- Variety: Exposing the wide variety of data available can be "a scary proposition for any data scientist," Borne observes, but nonetheless is essential.
- Voice: Remember that "your open data becomes the voice of your organization to your stakeholders."
- Vocabulary: "The semantics and schema (data models) that describe your data are more critical than ever when you provide the data for others to use," says Borne. "Search, discovery, and proper reuse of data all require good metadata, descriptions, and data modeling."
- Vulnerability: Accept that open data, because it is so open, will be subjected to "misuse, abuse, manipulation, or alteration."
- proVenance: This is the governance requirement behind open data offerings. "Provenance includes ownership, origin, chain of custody, transformations that been made to it, processing that has been applied to it (including which versions of processing software were used), the data's uses and their context, and more," says Borne.