Fourteen publishers have sued Canadia artificial intelligence firm Cohere for widespread unauthorized use of their content in developing and running its generative AI systems, alleging massive, systematic copyright and trademark infringement. It’s the latest legal salvo in the battle between content providers and generative AI models that digest their text and spit it back to users often word for word, including articles behind a paywall.
The complaint, filed in the Southern District of New York, says Cohere has infringed on thousands of articles and seeks a permanent injunction, jury trial and damages of up to $150k per work infringed.
“This is a lawsuit to protect journalism from systematic copyright and trademark infringement,” says the suit by Advance Local Media, Condé Nast, The Atlantic, Forbes Media, The Guardian, Business Insider, LA Times, McClatchy Media Company, Newsday, Plain Dealer Publishing Company, Politico, The Republican Company, Toronto Star Newspapers and Vox Media, all members of trade association News/Media Alliance.
“Rather than create its own content, Cohere takes the creative output of Publishers, some of the largest, most enduring, and most important news, magazine, and digital publishers in the United States and around the world. Without permission or compensation, Cohere uses scraped copies of our articles … to power its artificial intelligence (“AI”) service, which in turn competes with Publisher offerings and the emerging market for AI licensing.”
The burgeoning field of generative AI require huge amounts of content to train its models, resulting in increasingly frequent litigation. The New York Times is suing ChatGPT parent OpenAI in a similar action. News Corp.’s Dow Jones, which owns The Wall Street Journal and New York Post, has sued Jeff Bezos-backed Perplexity AI. A handful of lawsuits have hit over the past several years from novelist Michael Chabon to comedian Sarah Silverman, playwrights and others whose material has been used to train so-called large language models without permission or compensation.
In one victory earlier this week, Thomson Reuters won the first big AI copyright case from a 2020 lawsuit against startup Ross Intelligence. A judge ruled the AI firm had infringed copyright law by reproducing material from the media giant’s legal database Westlaw.
Cohere, today’s suit reads, “freely admits that ‘AI is only as useful as the data it can access’ … [but] fails to license the content it uses. Cohere takes Publishers’ valuable articles, without authorization and without providing compensation. Cohere copies, uses, and disseminates Publishers’ news and magazine articles to build and deliver a commercial service that mimics, undercuts, and competes with lawful sources for their articles and that displaces existing and emerging licensing markets.”
“Command is incapable of performing its own original research. It invests no resources into news gathering in the field and no has writers, fact-checkers, or editors on staff.” On the strength of the content it steals, the suit says, it charges for its product suite and actively courts customers.
The suit includes numerous screenshots of ripped off articles including an example of output that states, ‘”This story is available exclusively to Business Insider subscribers. Become an Insider and start reading now,”’ all the while providing the full article to any user who asks for it, whether they have a Business Insider subscription or not.”
As alarming are examples of “hallucinations,” or references to articles that do not exist.
“Not content with just stealing our works, Cohere also blatantly manufactures fake pieces and attributes them to us, misleading the public and tarnishing our brands,” the suit says.
It cites an article in The Guardian published on October 7, 2024 titled “The pain will never leave: Nova massacre survivors return to site one year on.” When prompted for this piece, Cohere “delivered a wildly inaccurate article that it represented was ‘published on June 29, 2022 in The Guardian by Luke Harding.’ Among other flaws, the Cohere article confused the October 7, 2023 massacre at The Nova Music Festival with a mass shooting that took place in Nova Scotia, Canada in 2020. Cohere also manufactured details about the Nova Scotia tragedy, attributing several quotes—including those gathered in The Guardian’s reporting — to Tom Bagley, a man who was murdered in the 2020 shootings and thus could neither “return to the scene of the killings” nor offer quotes to a news outlet. Needless to say, this fictional article never appeared in The Guardian.”