Microsoft looks to 'do for data sharing what open source did for code'

Microsoft is working to standardize data-sharing terms via pre-designed licensing agreements, the first of which now are available for preview and comment.

datasharingmicrosoft.jpg

As Microsoft seeks to make data-sharing across companies easier and more pervasive, company officials have seen areas where roadblocks can occur. Prevalent among these are the lack of consistent, standardized data-sharing terms and licensing agreements. On July 23, the company took a first potential step toward remedying this gap.

Microsoft is making publicly available today the first drafts of three proposed data-sharing agreements. It is looking for community feedback and input on them over the next few months. Each of the three is designed for particular data-sharing scenarios between companies -- not individuals -- and is covered by the Creative Commons license. Some of these agreements will be published on Microsoft's GitHub code-sharing site.

Microsoft officials said they believe these kinds of agreements could alleviate the need of companies to spend months or years negotiating and creating data-sharing governance agreements.

Microsoft Corporate Vice President and Chief IP Counsel Erich Andersen said that Microsoft is trying to bring open-source-license-like structure to these kinds of data-sharing agreements. The OSI maintains a number of pre-approved licenses, such as the Apache License, BSD License, MIT License, etc., which companies can use to license their source code.

"We're looking to do for data what open source did for code," Andersen said.

Microsoft expects these kinds of agreements could help the Open Data Initiative (ODI) participants in their quests to provide a single, unified view of customer data. ODI -- founded by Microsoft, Adobe and SAP last fall -- was conceived as a way for companies to "re-imagine customer experience management" (aka CRM) by being able to integrate CRM, ERP, commerce, sales, product usage and other data into a single data view that works across devices.

Microsoft's initial three data-sharing agreement proposals (which can be found here) are:

  • Open Use of Data Agreement (O-UDA): Designed for use with open datasets which don't include personal data or data owned by a data provider. It is the most open and least restricted of the three first proposals.
  • Computational Use of Data Agreement (C-UDA): Designed to define a use of data sets for AI training purposes which contain third-party materials. This is a contract for use with a database which includes open data but also some elements which are copyright-protectable (such as photos or snippets of text). It's for training an AI model but prohibits the republishing or redistributing of the protectable elements.
  • Data Use Agreement for Open AI Model Development (DUA-OAI): Designed for underlying data with elements which may involve privacy or when data may be proprietary to the controller of the data.

Microsoft recently announced another piece of the data-sharing puzzle with the Azure Data Share service, which is now in preview. Azure Data Share is designed to allow companies to share big datasets between them in a more secure way than something like FTP or via web APIs. Azure Data Share is meant for use with Azure Blob Service and Azure Data Lake Storage.