Thousands of authors sign letter urging AI makers to stop stealing books | TechCrunch

Photo of author

By Webdesk

[ad_1]

If you ask GPT-4 to do a passage in the style of Carmen Machado or Margaret Atwood or Alexander Chee, it will do it right, and rightly so: it probably incorporated all of their work into the training process, and now uses their ingenuity for themselves. But these authors, and thousands of others, are not happy with this fact.

In an open letter signed by more than 8,500 authors of fiction, non-fiction and poetry, the tech companies behind major language models like ChatGPT, Bard, LLaMa and more are being sued for using their writing without permission or compensation.

“These technologies mimic and break out our language, stories, style and ideas. Millions of copyrighted books, articles, essays and poetry are the “food” for AI systems, endless meals that have not been taken into account,” the letter reads.

Despite their systems proving capable of citing and impersonating the authors in question, AI developers have not substantially addressed the provenance of these works. Are they trained on monsters scraped from bookstores and reviews? Did they borrow every book from the library? Or maybe they just downloaded one of the many illegal archives like Libgen?

One thing is for sure, they didn’t go to publishers to license them – undoubtedly the preferred method, and perhaps the only legal and ethical one. As the authors write:

Not only does the recent Supreme Court ruling in Warhol v. Goldsmith make it clear that the high commercialism of your use argues against fair use, but no court would excuse copying illegally obtained works as fair use. As a result of embedding our writings in your systems, generative AI threatens to harm our profession by flooding the market with mediocre, machine-written books, stories, and journalism based on our work.

Indeed, we have already seen this happen. Recently, some AI-generated works became very low quality climbing the YA bestseller lists at Amazon; publishers are inundated with generated works; and every day this website (and soon this post) is scraped to reuse content in size for SEO.

These malicious actors use the tools, APIs and agents developed by the likes of OpenAI and Meta, which in this context can be said to be malicious actors themselves. After all, who else would knowingly steal millions of works to power a new commercial product? (Well, Google, of course, but search indexing is fundamentally different from AI ingestion, and at least Google Books had the excuse that it was meant to be a dedicated index.)

With fewer authors making a living from the complexity and narrow margins of large-scale publishing, the open letter warns that this is an unsustainable situation for them, especially newer authors, “particularly young writers and voices from underrepresented communities.”

The letter asks companies to do the following:

1. Obtain permission to use our copyrighted material in your generative AI programs.

2. Compensate writers fairly for the past and continued use of our works in your generative AI programs.

3. Compensate writers fairly for using our works in AI output, whether or not the output violates current laws.

No legal threat is posed — as The Author’s Guild CEO (and signatory) Mary Rasenberger told NPR, “Lawsuits are a huge amount of money. They take a really long time.” And AI is hurting authors now.

Which company will be the first to say “yes, we built our AI on stolen works and we’re sorry, and we’re going to pay for it”? It’s anyone’s guess, but there seems to be little incentive to do so. Most people are unaware or unconcerned that LLMs are created by any unauthorized means, and that they may, in fact, contain and break out copyrighted works. It’s easier to see the (very similar) problem when it’s a generated image reproducing an artist’s signature style, and there’s some backlash there.

But the more subtle harm of using all of George Saunders’s or Diana Gabaldon’s books as “food” for one’s AI might not spur so many people into action — though many authors are ready to fight.



[ad_2]

Source link

Share via
Copy link