As generative artificial intelligence (GenAI) gains traction in the technology landscape, it has sparked a series of legal battles centered on copyright issues. A wave of lawsuits from authors, artists, and music publishing companies is challenging how large language models (LLMs) utilize copyrighted materials during their training processes. Notable cases include The New York Times v. OpenAI, which questions the use of journalistic content; Disney v. Midjourney, focused on visual works; and Encyclopedia Britannica v. Perplexity AI, involving literary content. These disputes have emerged as key legal questions surface regarding whether training AI models on copyrighted content without explicit authorization constitutes copyright infringement.
At the heart of these disputes is the fair use doctrine, a cornerstone of U.S. copyright law codified in Section 107 of the 1976 Copyright Act. The doctrine lays out four non-exclusive factors that courts must consider when determining fair use: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use on the potential market for or value of the copyrighted work. The stakes are high, as decisions in these cases could redefine the legal landscape governing AI for years to come.
The U.S. Copyright Office has weighed in on the matter, releasing a Generative AI Training report as part of its broader Copyright and AI initiative. This report suggests that compiling datasets for AI training implicates copyright holders’ reproduction rights and warns against blanket applications of the fair use defense for training AI models. It specifically emphasizes the fourth fair use factor, indicating that AI training involving copyrighted material could cause substantial economic harm through lost sales and missed licensing opportunities.
Recent judicial decisions have started to shape the fair use landscape. In Thomson Reuters v. Ross Intelligence Inc., the court ruled in favor of Thomson Reuters, emphasizing that Ross’s use of copyrighted material was not transformative and amounted to a direct market substitute for Westlaw’s offerings. This ruling marked a significant precedent, as it highlighted the importance of the potential market for licensing copyrighted material used in AI training.
Further developments arose from cases like Bartz v. Anthropic PBC and Kadrey v. Meta Platforms, Inc., both decided in June 2025. In Bartz, a group of authors alleged that Anthropic had unlawfully copied their books to train its Claude AI model. Judge William Alsup ruled that while Anthropic’s use of legitimately acquired books constituted fair use, the use of pirated copies did not. He concluded that the fair use factors leaned in Anthropic’s favor, particularly regarding the transformative nature of the AI’s outputs.
Conversely, in Kadrey v. Meta, Judge Vince Chhabria ruled in favor of Meta, finding that the use of the authors’ works for training was highly transformative. He dismissed arguments suggesting harm to the licensing market, stating that evidence was lacking. The outcome of these cases illustrates the nuanced balance that courts must strike when interpreting fair use in the context of AI.
As the legal battles evolve, the significance of empirical evidence cannot be overstated. Courts are increasingly looking for sophisticated analyses to substantiate claims about the transformative nature of AI outputs and the economic impact on original works. For instance, statistical analysis may help clarify how user prompts interact with GenAI models and assess the overlap between training inputs and model outputs.
Moreover, the feasibility of removing specific data from training datasets poses another layer of complexity. As GenAI models typically operate on vast amounts of data, the cost and practicality of “untraining” these models is a technical concern that courts will need to address. The Bartz case highlighted these considerations, as both parties agreed on mechanisms to prevent the models from reproducing any copyrighted content.
Licensing frameworks also warrant examination. The fragmented nature of copyright ownership complicates the establishment of a viable market for licensing data used in AI training. With many works held by multiple rights holders, transaction costs can escalate, making it challenging for companies to negotiate the necessary permissions. Despite these hurdles, emerging collective licensing agreements may provide some pathways for navigating these complexities.
As legal interpretations of fair use continue to unfold, the outcomes of these cases will likely have profound implications for the future of AI development and copyright law. The intersection of technology, economics, and law at this juncture is critical for ensuring a balanced approach that fosters innovation while protecting creators’ rights. The unfolding narrative of AI and copyright will be one to watch closely as it shapes the landscape for generations to come.
OpenAI | U.S. Copyright Office | Thomson Reuters | Anthropic | Meta Platforms
See also
Akamai Launches NVIDIA-Powered Inference Cloud, Shares Surge 17.5% After Strong Q3 Results
India Surges to Third in Global AI Vibrancy Index, Outpacing UK and Germany
Global Market Volatility Set to Rise Amid AI Disruptions and Geopolitical Tensions
Germany”s National Team Prepares for World Cup Qualifiers with Disco Atmosphere
95% of AI Projects Fail in Companies According to MIT




















































