This Copyright Lawsuit Could Shape the Future of Generative AI
The tech industry might be reeling from a wave of layoffs, a dramatic crypto-crash, and ongoing turmoil at Twitter, but despite those clouds some investors and entrepreneurs are already eyeing a new boom—built on artificial intelligence that can generate coherent text, captivating images, and functional computer code. But that new frontier has a looming cloud of its own.
A class-action lawsuit filed in a federal court in California this month takes aim at GitHub Copilot, a powerful tool that automatically writes working code when a programmer starts typing. The coder behind the suit argue that GitHub is infringing copyright because it does not provide attribution when Copilot reproduces open-source code covered by a license requiring it.
The lawsuit is at an early stage, and its prospects are unclear because the underlying technology is novel and has not faced much legal scrutiny. But legal experts say it may have a bearing on the broader trend of generative AI tools. AI programs that generate paintings, photographs, and illustrations from a prompt, as well as text for marketing copy, are all built with algorithms trained on previous work produced by humans.
Visual artists have been the first to question the legality and ethics of AI that incorporates existing work. Some people who make a living from their visual creativity are upset that AI art tools trained on their work can then produce new images in the same style. The Recording Industry Association of America, a music industry group, has signaled that AI-powered music generation and remixing could be a new area of copyright concern.
“This whole arc that we’re seeing right now—this generative AI space—what does it mean for these new products to be sucking up the work of these creators?” says Matthew Butterick, a designer, programmer, and lawyer who brought the lawsuit against GitHub.
Copilot is a powerful example of the creative and commercial potential of generative AI technology. The tool was created by GitHub, a subsidiary of Microsoft that hosts the code for hundreds of millions of software projects. GitHub made it by training an algorithm designed to generate code from AI startup OpenAI on the vast collection of code it stores, producing a system that can preemptively complete large pieces of code after a programmer makes a few keystrokes. A recent study by GitHub suggests that coders can complete some tasks in less than half the time normally required when using Copilot as an aid.
But as some coders quickly noticed, Copilot will occasionally reproduce recognizable snippets of code cribbed from the millions of lines in public code repositories. The lawsuit filed by Butterick and others accuses Microsoft, GitHub, and OpenAI of infringing on copyright because this code does not include the attribution required by the open-source licenses covering that code.
Programmers have, of course, always studied, learned from, and copied each other’s code. But not everyone is sure it is fair for AI to do the same, especially if AI can then churn out tons of valuable code itself, without respecting the source material’s license requirements. “As a technologist, I’m a huge fan of AI ,” Butterick says. “I’m looking forward to all the possibilities of these tools. But they have to be fair to everybody.”