Generative AI, Copyright and Derivative Works = Conundrum

It has been shown that ChatGPT has been trained on copyright material, but does that mean derivative works infringe copyright?

By James MarineroPublished 12 months ago • Updated 12 months ago • 8 min read

Image from DeepAI.org, author © overlay

Artificial Intelligence (AI) is frighteningly revolutionizing our world by transforming various industries and sectors, including healthcare, finance, transportation, and education. And the law.

Yes, there's a lawyer-fest looming over several aspects of copyright law.

As AI continues to advance and evolve, it raises complex legal and ethical questions related to copyright and derivative works. The rapid growth and dissemination of AI-generated content has created a conundrum.

I'm not legal expert and this is just my amateur take on the issue, although as an author I do need to have some basic understanding of the law.

AI training

When AI Large Language Models (LLMs) are trained, they are exposed to huge quantities of real world data. The companies who build these systems do not declare what data is used for the training - or how accurate it might be. Accuracy is a can of worms we'll not go into here. What matters for now is the data that is used.

Conundrum

Who owns the copyright of AI generated works? Can AI create derivative works based on existing copyrighted content? Let's take a look at the issues and whether the conundrum can be resolved.

This issue was recently raised in a US Congressional hearing about AI at which Sam Altman (CEO of OpenAI) was being questioned.

Front and center for Sen. Marsha Blackburn was the issue of who should own all the AI-generated material produced by large language models trained on copyrighted works.
While Altman did not have much to offer in terms of solutions to the growing community of creators angry about their work being used to train large language models, he did say during the hearing that people should be able to opt out of having their data be used to train those models. - politico.com

Copyright

Copyright is a legal right that protects the creators of original works, including literary, artistic, musical, and dramatic works. The copyright owner has the exclusive right to use and distribute the work, and others must seek permission before using it. Such rights have time limits.

The Berne Convention

The Convention was first adopted in 1886 and has been revised several times. It provides a framework for copyright protection that is recognized by over 170 countries. It establishes the minimum standards for copyright protection, such as the rights of authors and limitations on copyright, and facilitates international cooperation and harmonization of copyright laws. The Convention aims to encourage creativity and protect the rights of creators while also promoting the free flow of information and ideas.

The time limit on copyright protection varies depending on the country and the type of work. In most countries that are signatories to the Berne Convention, the minimum duration of copyright protection is the life of the individual author plus 50 years after their death. In the US it is life plus 70 years. There are different times for corporations.

There are exceptions to copyright for 'fair use' and 'fair dealing'.

Generative AI brings new challenges

However, the rise of AI-generated content has raised questions about who owns the copyright of such works.

In the case of AI-generated works, it is unclear whether the creator is the human who developed the AI algorithm or the AI itself. Some argue that the human creator should own the copyright, while others believe that the copyright should belong to the AI.

One of the challenges in determining copyright ownership is that copyright law was designed for human creators and not for AI-generated content. The law does not clearly define what constitutes a "work" created by an AI system. As a result, there is a need for new legal frameworks that can address these issues.

Derivative Works and AI

A derivative work is a new work that is based on an existing work. For example, a movie based on a book or a remix of a song. Copyright law gives the copyright owner the exclusive right to create derivative works based on their original work.

However, with AI-generated content, it is unclear whether AI can create derivative works based on existing copyrighted content. Copyrighted content is widely used to 'train' AI using machine learning technology.

One of the challenges with AI-generated derivative works is that they can create works that are similar to the original but not identical. For example, an AI-generated painting based on a photograph may be similar to the original, but it will not be an exact copy. As a result, it is unclear whether AI-generated derivative works infringe on the original copyright.

This aspect of copyright law is currently being fought over between Getty Images and Stability AI.

Prove it!

How do you prove that an AI-generated image is a derivative of a copy-righted work?

Just use a simple black box test.

A group at UC Berkeley in the US have done just that, looking at output from OpenAI software.

When Large Language Models (LLMs) are fed a prompt, they decide what to output depending on the rules and distributions learned from training data. By analysing what the LLM actually returns in response, it's possible to infer the training data used.

The researchers devised a test they called "name cloze" designed to predict a single name in a passage of 40–60 tokens (one token is equivalent to about four text characters) that has no other named entities. The concept they had was that 'passing the test' indicated that the ChatGPT4 model had memorized the associated text.

"The data behind ChatGPT and GPT-4 is fundamentally unknowable outside of OpenAI," the authors wrote. "At no point do we access, or attempt to access, the true training data behind these models, or any underlying components of the systems. Our work carries out probabilistic inference to measure the familiarity of these models with a set of books, but the question of whether they truly exist within the training data of these models is not answerable."

In one cloze test they discovered that ChatGPT4 had been fed a copyright diet of books which included classics such as Nineteen Eighty-Four and The Lord of the Rings trilogy. They published the full list of 100 books on Google Docs.

The authors noted that the ChatGPT4 'reading list' was biased towards books in the sci-fi and fantasy genres.

There was no suggestion that the OpenAI LLMs stored the text verbatim.

Possible Solutions

To address the conundrum posed by AI, copyright, and derivative works, there could be several possible solutions.

One approach is to revise copyright law to include provisions that specifically address AI-generated works. This would involve defining what constitutes a "work" created by an AI system and clarifying copyright ownership. It would also involve updating the law to address derivative works created by AI.

Another approach is to use Creative Commons licenses, which are a set of standardized licenses that allow creators to share their works under certain conditions.

Creative Commons licenses can be used to give permission for others to use and modify works created by AI. This would provide a framework for AI-generated works to be used and shared while respecting copyright laws.

As a writer I was intrigued to read Cory Doctorow's views on this and how he uses CC licenses for some of his work.

Another possible solution is to use blockchain technology to manage copyright ownership. By using blockchain technology, it would be possible to track ownership of AI-generated works and ensure that copyright owners are compensated for their work. This approach is gaining traction for new works with some artists. But there are other views...

Conclusion

Where do users of the AI generative software stand if they use an image in a story and that image is proven to have been derived from a copyright work? An image of war, perhaps, shown to derive from a picture taken by Don McCullin?

And what precisely is 'fair use' and 'fair dealing' in the context of AI-generated works?

Resolving the conundrum around who owns the copyright of AI-generated works and whether AI can create derivative works based on existing copyrighted content will require a comprehensive and subtle legal framework. It will be essential to find a balance between protecting the rights of creators and fostering innovation and creativity.

By developing new legal frameworks and using and Creative Commons licenses, maybe even utilizing technology such as blockchain, we can ensure that AI-generated content is used and shared ethically and legally. Hopefully we can do that while fostering innovation and creativity as AI technology inevitably progresses.

In the meantime, as TheRegister pointed out, hard cases make bad law, and these copyright cases are sure to be very hard as case law is built.

And the future?

How will copyright work if Tom Hanks decides that his digital persona can be used to make films 'forever'? Would his copyright extend beyond 50/70 years after his death under the control of his descendants?

Whichever way the conundrum is resolved, there's money in it for the lawyers.

---

My novels are available at my Gumroad bookstore. Also at Amazon and Apple

workflow product review literature industry how to business wars business book review art advice

About the Creator

James Marinero

I live on a boat and write as I sail slowly around the world. Follow me for a varied story diet: true stories, humor, tech, AI, travel, geopolitics and more. I also write techno thrillers, with six to my name. More of my stories on Medium

Reader insights

Be the first to share your insights about this piece.

How does it work?

Add your insights

Comments

There are no comments for this story

Be the first to respond and start the conversation.

Keep reading

More stories from James Marinero and writers in Journal and other communities.