Merriam-Webster Sues OpenAI Over ChatGPT's Alleged Content Theft

Merriam-Webster Files Lawsuit Against OpenAI Over ChatGPT Training Data

Merriam-Webster, the renowned American dictionary subsidiary of Encyclopedia Britannica, has initiated legal proceedings against OpenAI, alleging that the artificial intelligence company unlawfully utilized its proprietary material to train the ChatGPT chatbot. The lawsuit was formally lodged in the Manhattan federal court on Friday, marking a significant escalation in the ongoing disputes between content creators and AI developers over intellectual property rights.

Claims of Content Cannibalization and Unauthorized Use

In the detailed complaint, both Britannica and Merriam-Webster assert that OpenAI engaged in the unauthorized copying of their online articles, encyclopedia entries, and dictionary definitions to educate ChatGPT on responding to human prompts. The companies contend that this practice has resulted in the "cannibalization" of their web traffic, as AI-generated summaries of their content divert users who would typically visit their official websites.

The lawsuit explicitly states, "Defendants' ChatGPT-based AI products free ride on Plaintiffs' trusted, high-quality content — made possible through the diligent work of human researchers, writers, editors, and creators — by cannibalizing traffic to Defendants' websites with AI-generated summaries of Plaintiffs' own content." Britannica further alleges that OpenAI copied nearly 100,000 of its articles without permission, with ChatGPT outputs sometimes replicating information verbatim from their sources.

—

Wide Pickt banner — collaborative shopping lists app for Telegram, phone mockup with grocery list

Broader Allegations and Legal Demands

Beyond the copyright infringement claims, the complaint includes accusations of trademark violations, citing OpenAI for implying it had authorization to reproduce the material, and for wrongfully referencing Britannica in instances of AI "hallucinations." Merriam-Webster emphasizes that the full scope of the alleged theft remains "uniquely" within OpenAI's knowledge, complicating the assessment of damages.

Britannica has requested an unspecified amount of monetary compensation and a court injunction to halt the purported infringement. In response, OpenAI has disputed the allegations, with a spokesperson stating on Monday, "Our models empower innovation, and are trained on publicly available data and grounded in fair use." The Independent has reached out to Britannica for additional commentary on the matter.

Context and Industry Implications

This lawsuit represents the latest in a series of copyright challenges faced by AI companies regarding the use of third-party content for training purposes. Last year, a group of authors settled a similar case with AI firm Anthropic after suing for copyright infringement. The case underscores the growing tensions between technological advancement and intellectual property protection, potentially setting a precedent for future legal battles in the rapidly evolving AI landscape.

As AI systems like ChatGPT continue to gain prominence, questions about data sourcing, fair use, and the economic impact on original content providers are becoming increasingly urgent. The outcome of this lawsuit could influence how AI developers access and utilize copyrighted materials, shaping the regulatory framework for artificial intelligence development moving forward.