On a related issue, two groups of writers, including George R.R. Martin, Michael Chabon, and John Grisham, are suing Microsoft and OpenAI for taking their work and using it in their LLMs. Copyright, the legal foundation of open source, is at the heart of this issue.
But this isn't simply a Microsoft problem.
As Sean O'Brien, Yale Law School lecturer in cybersecurity and founder of the Yale Privacy Lab, told my ZDNET colleague David Gewirtz: "I believe there will soon be an entire sub-industry of trolling that mirrors patent trolls, but this time surrounding AI-generated works. A feedback loop is created as more authors use AI-powered tools to ship code under proprietary licenses. Software ecosystems will be polluted with proprietary code that will be the subject of cease-and-desist claims by enterprising firms."
US attorney Richard Santalesa, a founding member of the SmartEdgeLaw Group, told Gewirtz that there exist both contract and copyright law issues -- and they're not the same thing. Santalesa believes companies producing AI-generated code will "as with all of their other IP, deem their provided materials – including AI-generated code – as their property." Besides, public domain code is not the same thing as open-source code.
So, what's to be done? Simply claiming your AI is open source is a nonstarter. Meta, for example, claims Llama 2 is open source. It's not.
As Erica Brescia, a managing director at RedPoint, the open source-friendly venture capital firm, asked on Twitter: "Can someone please explain to me how Meta and Microsoft can justify calling Llama 2 open source if it doesn't actually use an OSI [Open Source Initiative]-approved license or comply with the OSD [Open Source Definition]? Are they intentionally challenging the definition of OSS [Open Source Software]?"
Here's the short explanation: Meta is using open source as a marketing term, not a legal one. That usage won't fly once the lawsuits mount up
At the same time, as OpenUK CEO Amanda Brock observed, "I don't think we're going to see going forward any LLM or any significant AI being able to be licensed as open source, because the key to open source is the Open Source Definition."
And the road to that Definition was a long and bumpy one.
The first free software licenses began In the early 1980s when MIT Lab programmer Richard M. Stallman couldn't get an early laser printer, the Xerox 9700, to produce error messages. The problem? Stallman couldn't read or change its source code. At the time, this was a new development. Although we now think of proprietary software as the default, it wasn't then.
By 1985, Free Software was becoming popular, but it also had become clear that the word "free" was too ambiguous. After Netscape released Mozilla's source code -- which became the basis of the Firefox web browser -- several leading Free Software luminaries, including Eric S. Raymond, Bruce Perens, Michael Tiemann, Jon "Maddog" Hall, and Christine Peterson, coined the phrase open source to describe this kind of license. In 1998, Perens and Raymond went on to found the OSI, which drafted the Open Source Definition (OSD) and used this as the general guide to defining all open-source licenses.
All open-source licenses must comply with the OSD. For AI and LLMs, that's much easier said than done.
We've seen this problem coming for a while. At Open Source Europe in Bilbao, Spain, last month, I spoke with Stefano Maffulli, executive director of the Open Source Initiative (OSI), the organization that defines and manages open-source licenses. "The process started two years ago when GitHub Copilot came out," Maffulli told me. "It was a watershed moment. All of a sudden, code you wrote as a human for humans, everything we have produced and put on the Internet was being harvested for machine learning."
So, what can we do? Maffulli and other open-source and AI leaders are working on combining AI with open-source licenses in sensible ways.
Maffulli observed that combining AI with open-source licenses is as hard, if not harder, than when software copyright was first applied to source code in the 1980s (when Free Software and open-source were first defined). True, open-source AI programs -- such as TensorFlow, PyTorch, and Hugging Face -- work well with old-style licenses. But old-style software isn't the problem. It's where software and data mix that the existing open-source licenses begin to break down. Specifically, it's where all that data and code merge together in AI/ML artifacts -- such as datasets, models, and weights -- that's where trouble emerges. "Therefore," said Mafulli, "we need to make a new definition for open-source AI."
This must be a definition that all stakeholders can agree upon and work with. Free software and open source are no longer just matters for developers. The goals of open-source savvy programmers and lawyers aren't the same as those of AI companies. To address this, Maffulli, together with Google, Microsoft, GitHub, Open Forum Europe, Creative Commons, Wikimedia Foundation, Hugging Face, GitHub, the Linux Foundation, ACLU Mozilla, and the Internet Archive, are working on a draft for defining a common understanding of open-source AI. In other words, all the AI players are working on the definition.
If all goes well, we can expect to see the fruits of their labor as early as this month. And while this will only be the first draft of the AI Open Source Definition, I expect that it will be finalized as quickly as possible. Everyone involved knows that AI is advancing rapidly and the sooner we get an open-source framework around it, the better.