Connect with us

Hi, what are you looking for?

Top Stories

Major Tech Firms Use 15 Million YouTube Videos, Including 88,000 from Fox News, for AI Training

Major tech firms, including Microsoft and Meta, have leveraged 15 million YouTube videos for AI training, raising serious copyright concerns.

In a recent investigation by The Atlantic, staff writer Alex Reisner uncovered that **major technology companies** have utilized at least **15 million YouTube videos** as training data for their **AI video generation** products. This extensive use of content has raised significant concerns regarding intellectual property rights and ethical practices in the rapidly evolving AI landscape.

The investigation highlights over a dozen prominent training datasets compiled and employed by companies such as **Microsoft**, **Meta**, **Snap**, **Tencent**, **Runway**, and **ByteDance**. These datasets have been instrumental in enhancing the quality of AI-generated videos, showcasing how unauthorized usage of **YouTube** content has fueled advancements in this sector. Reisner draws an analogy, stating, “Much as **ChatGPT** couldn’t write like **Shakespeare** without first ‘reading’ Shakespeare, a video generator couldn’t construct a fake newscast without ‘watching’ tons of recorded broadcasts.”

Scope of Unauthorized Data Usage

The Atlantic’s reporting briefly mentions that among the training data, over **30,000** videos from the **BBC** were included, alongside hundreds of thousands from renowned news publishers and creators such as **The New York Times**, **The Washington Post**, **The Guardian**, **Al Jazeera**, and **The Wall Street Journal**. Specifically, more than **88,000** videos were sourced from **Fox News**, roughly **70,000** from **ABC News**, and over **55,000** from **Bloomberg** channels.

Much of this content originates from platforms owned by **Vox Media**, including **Vox**, **Eater**, and **The Dodo**, which collectively account for over **30,000** videos. The **New York Times** alone contributed over **11,604** videos across different datasets, with a significant portion coming from the **Runway Gen-3** dataset, which was launched in June **2024** and received acclaim for its capabilities.

Despite the extensive use of these videos, **YouTube** CEO **Neal Mohan** has reiterated that it is against the platform’s terms of service for third parties to download content for training purposes. **Lauren Starke**, a spokesperson for Vox Media, emphasized, “In order to survive, AI platforms know they need (and their consumers want) quality, credible content like ours that gives their products relevance and purpose.” Starke also noted that these companies have spent heavily on AI infrastructure but comparatively little on the content that fuels their models.

The Legal Landscape and Implications

The investigation raises profound questions about copyright and licensing, especially as many news organizations have not authorized the use of their videos for AI training. The **New York Times** has stated that it has not sanctioned the use of its **YouTube** content for AI purposes, reinforcing its legal rights to determine how and where its content is used.

Additionally, partnerships between news outlets and AI companies are becoming more common, as seen with **Vox Media’s** deal with **OpenAI** that allows the latter to use its content for products like **ChatGPT**. Starke indicated that Vox Media is considering further partnerships while also preparing to protect its intellectual property through legal channels when necessary.

Furthermore, internal documents from **Runway**, published by **404 Media**, reveal that the company strategically targeted videos from high-quality channels for its datasets. The spreadsheet indicated that videos were tagged for their specific features, revealing an organized method of selecting content that would enhance AI training.

As the AI industry continues to advance, firms like **Runway** have already integrated their products into traditional media workflows, with companies such as **Netflix** and **Walt Disney Co.** utilizing Runway’s tools for content production. However, the absence of reported licensing agreements between **Runway** and the news publishers whose content was included remains a troubling aspect of this story.

As AI technology evolves rapidly, the need for clear guidelines and ethical frameworks around content sourcing becomes increasingly pressing. The implications of using copyrighted material without consent could have long-lasting effects on the landscape of journalism and the integrity of AI-generated content.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Cybersecurity

Anthropic's Mythos exposes thousands of critical vulnerabilities in major systems, prompting $100M in defensive action from tech giants and U.S. banks.

AI Government

US Department of Defense partners with tech giants including SpaceX and OpenAI to launch an "AI-first" initiative aimed at enhancing military decision-making efficiency.

AI Marketing

BusySeed unveils Rankxa, a tool tracking brand visibility across AI-generated responses, revealing 90% of brands lack meaningful presence in this new landscape.

AI Marketing

AI Content Repurposer launches in 2026, enabling users to transform existing content into multiple formats in minutes for just $9.95, boosting productivity tenfold.

AI Technology

A1 Public Relations helps entertainment brands enhance AI visibility in 2026 by integrating structured content and fresh, authoritative media, ensuring they are recognized by...

AI Generative

OpenAI unveils GPT Image 2, achieving a record 242-point lead over competitors, transforming the AI image generation landscape with native reasoning capabilities.

AI Finance

More than 55% of Americans now turn to AI tools for financial advice, risking personal data exposure despite rising privacy concerns.

AI Business

Iren's new 1.6GW site in Oklahoma enhances its AI data center capacity, while Nebius secures $27B in deals, raising stakes in the competitive neocloud...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.