Galactica Statistics And User Trends 2026

Galactica Statistics And User Trends 2026

Meta’s Galactica processed 106 billion tokens from 48 million scientific papers before its public demo was shut down after just three days in November 2022. The large language model achieved 68.2% accuracy on LaTeX equations compared to GPT-3’s 49%, yet generated fabricated citations attributed to real researchers. This analysis covers Galactica’s training data, model architecture, benchmark results, and its lasting influence on scientific AI development.

Galactica Statistics: Key Facts

  • Galactica trained on 106 billion tokens across 4.25 epochs, processing approximately 450 billion total tokens.
  • The largest model contained 120 billion parameters and required 128 NVIDIA A100 nodes for training.
  • Meta withdrew the public demo on November 18, 2022, just 72 hours after launch.
  • Citation prediction accuracy ranged from 36.6% to 69.1% depending on the evaluation dataset.
  • As of 2024, 80.9% of published researchers report using LLMs in at least one research area.

Galactica Training Data Statistics

Meta AI described Galactica’s corpus as “curated humanity’s scientific knowledge,” distinguishing it from web-scraped training approaches. The dataset included scientific papers, textbooks, lecture notes, protein sequences, and chemical compounds.

Metric Value
Total Training Tokens 106 Billion
Scientific Papers 48 Million
In-Context Citations 360 Million+
Training Epochs 4.25

Galactica Model Architecture

Meta released five Galactica variants ranging from 125 million to 120 billion parameters. All models used a decoder-only Transformer architecture with a 2,048-token context window and 50,000-token vocabulary.

The flagship 120B model was intentionally sized to fit within a single NVIDIA A100 node with 80GB memory. This design prioritized accessibility for academic researchers with limited computational resources.

Galactica Benchmark Performance

Galactica outperformed several larger models on scientific reasoning tasks. The 30B variant surpassed PaLM 540B on mathematical reasoning despite being 18 times smaller.

Performance Against Larger Models

On the MATH benchmark, Galactica scored 20.4% compared to PaLM 540B’s 8.8%. The model also achieved state-of-the-art results on PubMedQA at 77.6% and MedMCQA at 52.9%.

Galactica Demo Timeline

Galactica’s public demo lasted 72 hours, making it one of the shortest-lived major AI product launches. The demo launched on November 15, 2022, and was withdrawn on November 18.

Event Date
Public Demo Launch November 15, 2022
Demo Withdrawal November 18, 2022
Days Before ChatGPT Launch 14 Days

Critics demonstrated that Galactica generated plausible but incorrect scientific content. Michael Black, Director of the Max Planck Institute for Intelligent Systems, described the outputs as potentially ushering in an era of “deep scientific fakes.”

Galactica Citation Accuracy Statistics

Citation prediction represented a core Galactica capability. Accuracy varied between 36.6% and 69.1% across evaluation datasets, with documented bias toward highly-cited papers.

Researchers found instances where Galactica generated citations to non-existent papers attributed to real scientists, including fabricated publications from Meta’s Reality Labs and Google AI researchers.

LLM Market Growth Projections

The global LLM market reached $2.08 billion in 2024 and is projected to grow to $15.64 billion by 2029. This represents a compound annual growth rate of 49.6%.

Enterprise AI adoption reached 78% in 2024. A study of over 800 verified published authors found that 80.9% reported using LLMs in at least one research area, reflecting the widespread adoption Galactica aimed to pioneer.

Galactica’s Influence on Meta AI

Despite its troubled launch, Galactica informed Meta’s subsequent AI strategy. The Llama model family launched in February 2023 with form-based researcher access rather than open public demos.

Meta AI VP of Research Joelle Pineau confirmed that lessons from Galactica were “folded into” subsequent model generations. Yann LeCun later cited the Galactica backlash when explaining Llama’s initial access restrictions.

FAQ

What happened to Meta’s Galactica?

Meta withdrew Galactica’s public demo on November 18, 2022, just three days after launch, following criticism that the model generated factually incorrect scientific content.

How many parameters did Galactica have?

Galactica was released in five variants: 125 million, 1.3 billion, 6.7 billion, 30 billion, and 120 billion parameters.

What data was Galactica trained on?

Galactica trained on 106 billion tokens from 48 million scientific papers, textbooks, lecture notes, protein sequences, and chemical compounds.

How accurate was Galactica at predicting citations?

Citation accuracy ranged from 36.6% to 69.1% depending on the dataset, with documented instances of fabricated paper citations.

Did Galactica influence other Meta AI models?

Yes. Meta confirmed lessons from Galactica were incorporated into the Llama model family, which launched with restricted access in February 2023.