Major Tech Firms Use 15 Million YouTube Videos, Including 88,000 from Fox News, for AI Training

Major tech firms, including Microsoft and Meta, have leveraged 15 million YouTube videos for AI training, raising serious copyright concerns.

Staff

Published

16 November, 2025

In a recent investigation by The Atlantic, staff writer Alex Reisner uncovered that **major technology companies** have utilized at least **15 million YouTube videos** as training data for their **AI video generation** products. This extensive use of content has raised significant concerns regarding intellectual property rights and ethical practices in the rapidly evolving AI landscape.

The investigation highlights over a dozen prominent training datasets compiled and employed by companies such as **Microsoft**, **Meta**, **Snap**, **Tencent**, **Runway**, and **ByteDance**. These datasets have been instrumental in enhancing the quality of AI-generated videos, showcasing how unauthorized usage of **YouTube** content has fueled advancements in this sector. Reisner draws an analogy, stating, “Much as **ChatGPT** couldn’t write like **Shakespeare** without first ‘reading’ Shakespeare, a video generator couldn’t construct a fake newscast without ‘watching’ tons of recorded broadcasts.”

Scope of Unauthorized Data Usage

The Atlantic’s reporting briefly mentions that among the training data, over **30,000** videos from the **BBC** were included, alongside hundreds of thousands from renowned news publishers and creators such as **The New York Times**, **The Washington Post**, **The Guardian**, **Al Jazeera**, and **The Wall Street Journal**. Specifically, more than **88,000** videos were sourced from **Fox News**, roughly **70,000** from **ABC News**, and over **55,000** from **Bloomberg** channels.

Much of this content originates from platforms owned by **Vox Media**, including **Vox**, **Eater**, and **The Dodo**, which collectively account for over **30,000** videos. The **New York Times** alone contributed over **11,604** videos across different datasets, with a significant portion coming from the **Runway Gen-3** dataset, which was launched in June **2024** and received acclaim for its capabilities.

Despite the extensive use of these videos, **YouTube** CEO **Neal Mohan** has reiterated that it is against the platform’s terms of service for third parties to download content for training purposes. **Lauren Starke**, a spokesperson for Vox Media, emphasized, “In order to survive, AI platforms know they need (and their consumers want) quality, credible content like ours that gives their products relevance and purpose.” Starke also noted that these companies have spent heavily on AI infrastructure but comparatively little on the content that fuels their models.

The Legal Landscape and Implications

The investigation raises profound questions about copyright and licensing, especially as many news organizations have not authorized the use of their videos for AI training. The **New York Times** has stated that it has not sanctioned the use of its **YouTube** content for AI purposes, reinforcing its legal rights to determine how and where its content is used.

Additionally, partnerships between news outlets and AI companies are becoming more common, as seen with **Vox Media’s** deal with **OpenAI** that allows the latter to use its content for products like **ChatGPT**. Starke indicated that Vox Media is considering further partnerships while also preparing to protect its intellectual property through legal channels when necessary.

Furthermore, internal documents from **Runway**, published by **404 Media**, reveal that the company strategically targeted videos from high-quality channels for its datasets. The spreadsheet indicated that videos were tagged for their specific features, revealing an organized method of selecting content that would enhance AI training.

As the AI industry continues to advance, firms like **Runway** have already integrated their products into traditional media workflows, with companies such as **Netflix** and **Walt Disney Co.** utilizing Runway’s tools for content production. However, the absence of reported licensing agreements between **Runway** and the news publishers whose content was included remains a troubling aspect of this story.

As AI technology evolves rapidly, the need for clear guidelines and ethical frameworks around content sourcing becomes increasingly pressing. The implications of using copyrighted material without consent could have long-lasting effects on the landscape of journalism and the integrity of AI-generated content.

AI Cybersecurity

Microsoft Launches Security Dashboard for AI, Enhancing Risk Management for Enterprises

Microsoft unveils the Security Dashboard for AI in public preview, streamlining enterprise AI risk management by aggregating signals from Defender, Entra, and Purview.

Rachel Torres2 hours ago

AI Technology

New Report Reveals 74% of Big Tech’s AI Climate Claims Are Unproven, Exposing Greenwashing

A new report reveals that 74% of climate claims by tech giants like Google and Microsoft lack evidence, highlighting serious environmental costs of AI...

Staff3 hours ago

Alibaba and ByteDance Launch Qwen-Image-2.0 and Seedream 5.0, Transforming AI Image Generation

Alibaba and ByteDance unveil Qwen-Image-2.0 and Seedream 5.0, revolutionizing AI image generation with enhanced controllability and adaptability ahead of the Spring Festival.

Staff4 hours ago

AI Impact Summit Set to Unlock ₹8 Lakh Crore Investments, Position India as Global Tech Leader

AI Impact Summit in India aims to unlock ₹8 lakh crore in investments, gathering leaders like Bill Gates and Sundar Pichai to shape global...

Staff5 hours ago

ByteDance Launches Seedance 2.0, Disrupting Hollywood with AI-Generated Content

ByteDance's Seedance 2.0 generates high-quality videos mimicking Hollywood scenes, raising concerns over copyright and the future of traditional filmmaking.

Staff8 hours ago

Runway Secures $315 Million in Series E Funding, Valuation Soars to $5.3 Billion

Runway secures $315 million in Series E funding, boosting its valuation to $5.3 billion to enhance next-gen AI video generation and world modeling technologies

Staff11 hours ago

AI Research

AI Simplifies Medical Scan Reports by 50%, Enhancing Patient Understanding, Says Study

AI could simplify medical scan reports by nearly 50%, enhancing patient understanding from a university level to that of an 11- to 13-year-old, says...

Staff11 hours ago

FTC Intensifies Probe into Microsoft’s Cloud and AI Practices Amid Monopoly Concerns

FTC intensifies its investigation into Microsoft’s cloud and AI services, probing potential monopolistic practices that could reshape the tech landscape.

Staff13 hours ago

AIPRESSA.COM

Top Stories

Major Tech Firms Use 15 Million YouTube Videos, Including 88,000 from Fox News, for AI Training

Scope of Unauthorized Data Usage