Posted by z3d in ArtInt

ChatGPT developer OpenAI recently acknowledged the necessity of using copyrighted material in the development of AI tools like ChatGPT, The Telegraph reports, saying they would be "impossible" without it. The statement came as part of a submission to the UK's House of Lords communications and digital select committee inquiry into large language models.

AI models like ChatGPT and the image generator DALL-E gain their abilities from training sessions fed, in part, by large quantities of content scraped from the public Internet without the permission of rights holders (In the case of OpenAI, some of the training content is licensed, however). This sort of free-for-all scraping is part of a longstanding tradition in academic machine learning research, but because deep learning AI models went commercial recently, the practice has come under intense scrutiny.

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

Further, OpenAI writes that limiting training data to public domain books and drawings "created more than a century ago" would not provide AI systems that "meet the needs of today's citizens."

3

Comments

You must log in or register to comment.

Titlacahuan wrote

Good, as in bad for the AI hype. I'm on the fence whether copyright is more despicable, but once AI-generated images and videos become copyrighted and in turn fed into other training models, you get a giant Ouroboros. That always ends well.

1