Connect with us

Hi, what are you looking for?

AI Generative

Meta’s Galactica Processes 106B Tokens But Faces Backlash Over Fabricated Citations

Meta’s Galactica processes 106 billion tokens from 48 million papers but faces backlash for 69% citation inaccuracy and generating fabricated research.

Meta’s experimental AI model, Galactica, underwent a brief public demonstration in November 2022, processing a staggering 106 billion tokens from approximately 48 million scientific papers. The model, designed to advance scientific reasoning capabilities, was withdrawn just three days later, following widespread criticism for generating misleading scientific content and fabricated citations. Despite these challenges, Galactica’s development has sparked significant discussions regarding the intersection of artificial intelligence and scientific integrity.

The model was built on a curated dataset that Meta AI described as “curated humanity’s scientific knowledge,” setting it apart from typical web-scraped training methodologies. This dataset encompassed an array of scientific literature, including research papers, textbooks, and lecture notes, integrating over 360 million in-context citations. Galactica was trained over 4.25 epochs using a decoder-only Transformer architecture, with the flagship variant comprising 120 billion parameters and requiring 128 NVIDIA A100 nodes for efficient processing.

Galactica’s performance on scientific benchmarks was notable, achieving an accuracy of 68.2% for LaTeX equation generation—substantially higher than GPT-3’s 49%. The model also outperformed larger counterparts on specific tasks; for instance, the 30 billion parameter variant surpassed the PaLM 540 billion model in mathematical reasoning, scoring 20.4% on the MATH benchmark compared to PaLM’s 8.8%. Other achievements included state-of-the-art results on PubMedQA and MedMCQA.

However, the accuracy of Galactica’s citation predictions varied significantly, ranging from 36.6% to 69.1%, often biased toward highly-cited works. Critics highlighted instances where the model generated citations for non-existent papers attributed to real researchers, including fabricated works from Meta’s Reality Labs and Google AI researchers. Michael Black, Director of the Max Planck Institute for Intelligent Systems, voiced concerns that such outputs could lead to what he termed “deep scientific fakes.”

Galactica’s public demo, which commenced on November 15, 2022, was one of the shortest in the AI industry, ending on November 18, just two weeks prior to the launch of OpenAI’s ChatGPT. The rapid withdrawal reflected not only the internal challenges faced by Meta but also the heightened scrutiny surrounding AI-generated scientific outputs. This backlash provided critical lessons that influenced Meta’s subsequent AI initiatives.

In the aftermath of Galactica, Meta pivoted its strategy, launching the Llama model family in February 2023 with a focus on form-based access for researchers rather than public demos. Joelle Pineau, VP of Research at Meta AI, acknowledged that insights gained from Galactica were integral to the development of future models. Additionally, Yann LeCun noted the Galactica incident when discussing the access restrictions for Llama, underscoring the need for responsible deployment of AI technologies.

The ramifications of Galactica extend beyond Meta, as the global market for large language models (LLMs) reached $2.08 billion in 2024 and is projected to grow to $15.64 billion by 2029, reflecting a compound annual growth rate of 49.6%. Notably, a study conducted in 2024 found that 80.9% of researchers reported using LLMs in at least one research area, highlighting a significant trend towards AI adoption in academia.

While Galactica faced challenges during its brief debut, the model’s development has left a lasting imprint on the landscape of scientific AI. The lessons learned from its rollout are likely to shape how future AI models are developed, accessed, and utilized. As the demand for reliable AI in research continues to grow, the industry must address the delicate balance between innovation and accuracy, ensuring that technology serves to enhance, rather than undermine, the pursuit of knowledge.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

Top Stories

Lenovo unveils AI Glasses concept for CES 2026, featuring 8-hour battery life and advanced AI functionalities to challenge Apple and Meta's dominance.

AI Marketing

Meta grapples with regulatory scrutiny while investing $2-3B in AI startup Manus, as it faces potential revenue decline of 4.8% amid advertising challenges.

Top Stories

Google faces a talent exodus as key AI figures, including DeepMind cofounder Mustafa Suleyman, depart for Microsoft in a $650M hiring spree.

AI Technology

Meta shares drop 0.9% to $660 as scrutiny over scam ads intensifies and the $2B Manus AI acquisition raises regulatory concerns.

Top Stories

Meta acquires Manus to enhance its AI capabilities, leveraging Manus's 2.5% score on the Remote Labor Index to drive scalable automation solutions.

AI Education

AI is set to become integral to educational systems by 2026, with major firms like Google and Microsoft reshaping curricula amid rising demands for...

Top Stories

Google secures a $20 billion order for TPUs as nine AI giants, including OpenAI and Meta, reshape the industry landscape in a pivotal 2025...

Top Stories

Major tech companies like Meta and Amazon are laying off middle managers despite their crucial role in execution, risking organizational resilience amid AI-driven workforce...

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.