AI Generative

Meta’s Galactica Processes 106B Tokens But Faces Backlash Over Fabricated Citations

Meta’s Galactica processes 106 billion tokens from 48 million papers but faces backlash for 69% citation inaccuracy and generating fabricated research.

Staff

Published

12 December, 2025

Meta’s experimental AI model, Galactica, underwent a brief public demonstration in November 2022, processing a staggering 106 billion tokens from approximately 48 million scientific papers. The model, designed to advance scientific reasoning capabilities, was withdrawn just three days later, following widespread criticism for generating misleading scientific content and fabricated citations. Despite these challenges, Galactica’s development has sparked significant discussions regarding the intersection of artificial intelligence and scientific integrity.

The model was built on a curated dataset that Meta AI described as “curated humanity’s scientific knowledge,” setting it apart from typical web-scraped training methodologies. This dataset encompassed an array of scientific literature, including research papers, textbooks, and lecture notes, integrating over 360 million in-context citations. Galactica was trained over 4.25 epochs using a decoder-only Transformer architecture, with the flagship variant comprising 120 billion parameters and requiring 128 NVIDIA A100 nodes for efficient processing.

Galactica’s performance on scientific benchmarks was notable, achieving an accuracy of 68.2% for LaTeX equation generation—substantially higher than GPT-3’s 49%. The model also outperformed larger counterparts on specific tasks; for instance, the 30 billion parameter variant surpassed the PaLM 540 billion model in mathematical reasoning, scoring 20.4% on the MATH benchmark compared to PaLM’s 8.8%. Other achievements included state-of-the-art results on PubMedQA and MedMCQA.

However, the accuracy of Galactica’s citation predictions varied significantly, ranging from 36.6% to 69.1%, often biased toward highly-cited works. Critics highlighted instances where the model generated citations for non-existent papers attributed to real researchers, including fabricated works from Meta’s Reality Labs and Google AI researchers. Michael Black, Director of the Max Planck Institute for Intelligent Systems, voiced concerns that such outputs could lead to what he termed “deep scientific fakes.”

Galactica’s public demo, which commenced on November 15, 2022, was one of the shortest in the AI industry, ending on November 18, just two weeks prior to the launch of OpenAI’s ChatGPT. The rapid withdrawal reflected not only the internal challenges faced by Meta but also the heightened scrutiny surrounding AI-generated scientific outputs. This backlash provided critical lessons that influenced Meta’s subsequent AI initiatives.

In the aftermath of Galactica, Meta pivoted its strategy, launching the Llama model family in February 2023 with a focus on form-based access for researchers rather than public demos. Joelle Pineau, VP of Research at Meta AI, acknowledged that insights gained from Galactica were integral to the development of future models. Additionally, Yann LeCun noted the Galactica incident when discussing the access restrictions for Llama, underscoring the need for responsible deployment of AI technologies.

The ramifications of Galactica extend beyond Meta, as the global market for large language models (LLMs) reached $2.08 billion in 2024 and is projected to grow to $15.64 billion by 2029, reflecting a compound annual growth rate of 49.6%. Notably, a study conducted in 2024 found that 80.9% of researchers reported using LLMs in at least one research area, highlighting a significant trend towards AI adoption in academia.

While Galactica faced challenges during its brief debut, the model’s development has left a lasting imprint on the landscape of scientific AI. The lessons learned from its rollout are likely to shape how future AI models are developed, accessed, and utilized. As the demand for reliable AI in research continues to grow, the industry must address the delicate balance between innovation and accuracy, ensuring that technology serves to enhance, rather than undermine, the pursuit of knowledge.

AI Impact Summit Set to Unlock ₹8 Lakh Crore Investments, Position India as Global Tech Leader

AI Impact Summit in India aims to unlock ₹8 lakh crore in investments, gathering leaders like Bill Gates and Sundar Pichai to shape global...

Staff8 hours ago

AI Technology

Peter Steinberger Joins OpenAI; OpenClaw to Remain Open Source Project

OpenAI hires OpenClaw creator Peter Steinberger, sustaining the project's open-source status amidst fierce competition for AI engineering talent.

Staff1 day ago

Corning Secures $6B Fiber Optic Deal with Meta to Enhance AI Data Center Capacity

Corning secures a $6 billion contract with Meta to enhance AI data center infrastructure, signaling strong growth potential in optical communications.

Staff2 days ago

GCM Corp Unveils Innovative AI Thermal Management Solution for Data Centres

Meta enhances WhatsApp with robust end-to-end encryption for calls, personalized chat options, and user-friendly disappearing messages, aiming to regain user trust.

Staff3 days ago

AI’s $660B Boom Sparks Market Volatility as Investors Fear Job Disruption and Profit Margins

AI hyperscalers, led by Alphabet and Meta, are projected to invest $660B in 2023, sparking market volatility and fears of job disruption across sectors.

Staff3 days ago

AI Technology

SK Chairman Chey Tae-won Discusses AI Collaborations with Nvidia, Google, and Microsoft

SK Group Chairman Chey Tae-won forges strategic AI partnerships with Nvidia, Microsoft, Meta, and Google to enhance SK hynix's role in global AI infrastructure

Staff4 days ago

Meta Breaks Ground on $10B, 1GW AI Data Center in Indiana, Boosting Local Economy

Meta invests over $10 billion to build a 1GW AI data center in Indiana, creating 4,000 construction jobs and committing to sustainable community initiatives.

Staff5 days ago

AI Finance

Meta Announces $10B, 1-Gigawatt Data Center in Indiana for AI Infrastructure Expansion

Meta unveils plans for a $10B data center in Indiana with over 1 gigawatt capacity, marking a major push in AI infrastructure amid rising...

Marcus Chen5 days ago

AIPRESSA.COM

AI Generative

Meta’s Galactica Processes 106B Tokens But Faces Backlash Over Fabricated Citations

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

Top Stories

DeepMind Achieves Breakthroughs with AlphaFold and AlphaZero, Transforming AI Landscape

You May Also Like

Top Stories

AI Impact Summit Set to Unlock ₹8 Lakh Crore Investments, Position India as Global Tech Leader

AI Technology

Peter Steinberger Joins OpenAI; OpenClaw to Remain Open Source Project

Top Stories

Corning Secures $6B Fiber Optic Deal with Meta to Enhance AI Data Center Capacity

Top Stories

GCM Corp Unveils Innovative AI Thermal Management Solution for Data Centres

Top Stories

AI’s $660B Boom Sparks Market Volatility as Investors Fear Job Disruption and Profit Margins

AI Technology

SK Chairman Chey Tae-won Discusses AI Collaborations with Nvidia, Google, and Microsoft

Top Stories

Meta Breaks Ground on $10B, 1GW AI Data Center in Indiana, Boosting Local Economy

AI Finance

Meta Announces $10B, 1-Gigawatt Data Center in Indiana for AI Infrastructure Expansion