Connect with us

Hi, what are you looking for?

AI Generative

Meta’s Galactica Processes 106B Tokens But Faces Backlash Over Fabricated Citations

Meta’s Galactica processes 106 billion tokens from 48 million papers but faces backlash for 69% citation inaccuracy and generating fabricated research.

Meta’s experimental AI model, Galactica, underwent a brief public demonstration in November 2022, processing a staggering 106 billion tokens from approximately 48 million scientific papers. The model, designed to advance scientific reasoning capabilities, was withdrawn just three days later, following widespread criticism for generating misleading scientific content and fabricated citations. Despite these challenges, Galactica’s development has sparked significant discussions regarding the intersection of artificial intelligence and scientific integrity.

The model was built on a curated dataset that Meta AI described as “curated humanity’s scientific knowledge,” setting it apart from typical web-scraped training methodologies. This dataset encompassed an array of scientific literature, including research papers, textbooks, and lecture notes, integrating over 360 million in-context citations. Galactica was trained over 4.25 epochs using a decoder-only Transformer architecture, with the flagship variant comprising 120 billion parameters and requiring 128 NVIDIA A100 nodes for efficient processing.

Galactica’s performance on scientific benchmarks was notable, achieving an accuracy of 68.2% for LaTeX equation generation—substantially higher than GPT-3’s 49%. The model also outperformed larger counterparts on specific tasks; for instance, the 30 billion parameter variant surpassed the PaLM 540 billion model in mathematical reasoning, scoring 20.4% on the MATH benchmark compared to PaLM’s 8.8%. Other achievements included state-of-the-art results on PubMedQA and MedMCQA.

However, the accuracy of Galactica’s citation predictions varied significantly, ranging from 36.6% to 69.1%, often biased toward highly-cited works. Critics highlighted instances where the model generated citations for non-existent papers attributed to real researchers, including fabricated works from Meta’s Reality Labs and Google AI researchers. Michael Black, Director of the Max Planck Institute for Intelligent Systems, voiced concerns that such outputs could lead to what he termed “deep scientific fakes.”

Galactica’s public demo, which commenced on November 15, 2022, was one of the shortest in the AI industry, ending on November 18, just two weeks prior to the launch of OpenAI’s ChatGPT. The rapid withdrawal reflected not only the internal challenges faced by Meta but also the heightened scrutiny surrounding AI-generated scientific outputs. This backlash provided critical lessons that influenced Meta’s subsequent AI initiatives.

In the aftermath of Galactica, Meta pivoted its strategy, launching the Llama model family in February 2023 with a focus on form-based access for researchers rather than public demos. Joelle Pineau, VP of Research at Meta AI, acknowledged that insights gained from Galactica were integral to the development of future models. Additionally, Yann LeCun noted the Galactica incident when discussing the access restrictions for Llama, underscoring the need for responsible deployment of AI technologies.

The ramifications of Galactica extend beyond Meta, as the global market for large language models (LLMs) reached $2.08 billion in 2024 and is projected to grow to $15.64 billion by 2029, reflecting a compound annual growth rate of 49.6%. Notably, a study conducted in 2024 found that 80.9% of researchers reported using LLMs in at least one research area, highlighting a significant trend towards AI adoption in academia.

While Galactica faced challenges during its brief debut, the model’s development has left a lasting imprint on the landscape of scientific AI. The lessons learned from its rollout are likely to shape how future AI models are developed, accessed, and utilized. As the demand for reliable AI in research continues to grow, the industry must address the delicate balance between innovation and accuracy, ensuring that technology serves to enhance, rather than undermine, the pursuit of knowledge.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

AI Impact Summit in India aims to unlock ₹8 lakh crore in investments, gathering leaders like Bill Gates and Sundar Pichai to shape global...

AI Technology

OpenAI hires OpenClaw creator Peter Steinberger, sustaining the project's open-source status amidst fierce competition for AI engineering talent.

Top Stories

Corning secures a $6 billion contract with Meta to enhance AI data center infrastructure, signaling strong growth potential in optical communications.

Top Stories

Meta enhances WhatsApp with robust end-to-end encryption for calls, personalized chat options, and user-friendly disappearing messages, aiming to regain user trust.

Top Stories

AI hyperscalers, led by Alphabet and Meta, are projected to invest $660B in 2023, sparking market volatility and fears of job disruption across sectors.

AI Technology

SK Group Chairman Chey Tae-won forges strategic AI partnerships with Nvidia, Microsoft, Meta, and Google to enhance SK hynix's role in global AI infrastructure

Top Stories

Meta invests over $10 billion to build a 1GW AI data center in Indiana, creating 4,000 construction jobs and committing to sustainable community initiatives.

AI Finance

Meta unveils plans for a $10B data center in Indiana with over 1 gigawatt capacity, marking a major push in AI infrastructure amid rising...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.