Various content creators, including authors, songwriters, and media outlets like The New York Times, are taking legal action, claiming that generative AI, trained on copyrighted content, produces identical copies without permission.
Before ChatGPT was introduced, Copyleaks, an artificial intelligence text analysis company, had already offered plagiarism detection services to companies and educational institutions for some time.
When ChatGPT first launched, it used the GPT-3.5 model, but OpenAI has now upgraded to the more advanced and powerful GPT-4.0 for its operations.
Plagiarism can manifest in various ways beyond just directly copying and pasting entire sentences and paragraphs.
Copyleaks requested approximately a thousand outputs from GPT-3.5, each consisting of about 400 words, covering 26 different subjects.
Among the GPT-3.5 outputs analyzed, the one with the highest similarity score was in computer science (100%), with physics (92%) and psychology (88%) following closely behind.
The subjects with the lowest similarity scores were theater (0.9%), humanities (2.8%), and English language (5.4%).
“Our models were created and trained to understand concepts to aid in problem-solving. We have implemented safeguards to prevent unintentional memorization, and our terms of service forbid the deliberate use of our models to reproduce content.“
OpenAI spokesperson Lindsey Held stated in a communication to Axios,
In the legal case filed by The New York Times against Microsoft and OpenAI, it is alleged that the AI systems’ extensive replication of content amounts to copyright infringement.
In response to the lawsuit, OpenAI contended that “regurgitation” is an uncommon issue and accused The New York Times of manipulating prompts.