OpenFold reached 24 consortium members and processed 16 million multiple sequence alignments in OpenProteinSet as of October 2025, establishing itself as the leading open-source platform for protein structure prediction. The PyTorch-based implementation achieves 3-5x faster inference than AlphaFold2 while maintaining comparable accuracy across CASP15 benchmarks. OpenFold3-preview launched in October 2025, introducing protein-ligand complex prediction under the Apache 2.0 license, enabling unrestricted commercial applications for drug discovery.
This analysis examines verified data on OpenFold’s training datasets, consortium growth, computational performance, and pharmaceutical applications through 2025.
OpenFold Key Statistics 2025
- OpenFold has 24 consortium members including six global pharmaceutical companies as of 2025.
- OpenProteinSet contains 16 million multiple sequence alignments available for public research.
- OpenFold achieves 3-5x faster inference speeds compared to AlphaFold2 baseline performance.
- OpenFold3 training utilized 256 GPUs with 85% cost reduction through AWS optimization strategies.
- The platform recorded a mean GDT-TS score of 68.6-78.8 across 90 CASP15 evaluated domains.
OpenFold Training Dataset Statistics
OpenProteinSet represents one of the largest publicly available datasets for protein structure prediction training, containing pre-computed multiple sequence alignments that eliminate millions of CPU-hours typically required for MSA generation.
The dataset includes 16 million total MSAs with an average depth of 940 sequences per alignment. Median MSA depth reaches 262 sequences, providing robust evolutionary information for structure prediction algorithms.
| Dataset Metric | Value |
|---|---|
| Total MSAs in OpenProteinSet | 16+ million |
| Filtered Uniclust30 MSAs | 270,000 |
| Protein Sequences (AWS RODA) | 4.5 million |
| Average MSA Depth | 940 sequences |
| Median MSA Depth | 262 sequences |
| OpenFold3 Experimental Structures | 300,000+ |
| OpenFold3 Synthetic Structures | 13+ million |
OpenFold3 expanded training data to include 300,000 experimentally determined structures and 13 million synthetic structures for enhanced protein-ligand complex prediction capabilities.
OpenFold Consortium Membership Growth
The OpenFold Consortium expanded from its February 2022 founding to 24 member organizations by 2025, attracting major pharmaceutical companies and technology corporations committed to open-source AI development.
Six global pharmaceutical firms joined the consortium alongside three technology companies providing infrastructure support. Academic partnerships include Columbia University and Seoul National University leading OpenFold3 development efforts.
| Organization Type | Count (2025) |
|---|---|
| Total Member Companies | 24+ |
| Global Pharmaceutical Firms | 6 |
| Technology Companies | 3 (AWS, Microsoft, NVIDIA) |
| Academic Partnerships | 2 |
| OpenFold3 Development Institutions | 40+ |
Pharmaceutical members include Bristol Myers Squibb, Novo Nordisk, Bayer, Biogen, UCB, and Astex Pharmaceuticals. Biotechnology companies such as Arzeda, Cyrus Biotechnology, Outpace Bio, and Psivant Therapeutics contribute to development efforts.
OpenFold Performance Benchmark Results
OpenFold demonstrated prediction accuracy on par with AlphaFold2 across 90 CASP15 domains evaluated using GDT-TS metrics. The platform recorded a mean GDT-TS score of 68.6-78.8 with 95% confidence intervals.
AlphaFold2 achieved a mean GDT-TS of 69.7-79.2 under identical testing conditions. OpenFold matched or exceeded AlphaFold2 performance on 50% of evaluated targets, establishing performance parity between the open-source and proprietary implementations.
| Benchmark Metric | Measurement |
|---|---|
| OpenFold Mean GDT-TS (95% CI) | 68.6 – 78.8 |
| AlphaFold2 Mean GDT-TS (95% CI) | 69.7 – 79.2 |
| CASP15 Domains Evaluated | 90 |
| Targets Matching/Exceeding AlphaFold2 | 50% |
| Bootstrap Samples for CI | 10,000 |
Confidence intervals derived from 10,000 bootstrap samples provide statistically robust validation of OpenFold’s accuracy relative to DeepMind’s original implementation.
OpenFold Computational Efficiency Metrics
OpenFold delivers 3-5x faster inference speeds compared to AlphaFold2 while supporting sequences up to 4,600 residues on A100 40GB GPUs. The platform achieves 90% accuracy with only 3% of full training time required.
Novo Nordisk’s Research Collaboration Platform recorded 85% cost reduction using AWS EC2 Capacity Blocks and Spot Instances across 256 GPUs for OpenFold3 training infrastructure.
| Efficiency Metric | Value |
|---|---|
| Speed vs AlphaFold2 | 3-5x faster |
| Max Sequence Length (A100 40GB) | 4,600 residues |
| Training Time to 90% Accuracy | ~3% of full training |
| AWS Cost Reduction | 85% |
| GPUs for OpenFold3 Training | 256 |
| DeepSpeed Peak Memory Reduction | 13x |
| Inference Speedup (Custom Kernels) | 4x |
DeepSpeed DS4Sci integration achieved 13x peak memory reduction, enabling distributed training across larger GPU clusters. Custom kernel implementations provide 4x inference speedup for production deployments.
OpenFold vs AlphaFold Technical Comparison
OpenFold differentiates from AlphaFold through full training code availability and PyTorch framework implementation. The platform supports custom model training unavailable in AlphaFold’s JAX-based architecture.
OpenFold3 operates under Apache 2.0 licensing permitting unrestricted commercial applications, while AlphaFold3 restricts usage to academic research. This licensing difference positions OpenFold as the preferred platform for pharmaceutical development workflows.
| Feature | OpenFold | AlphaFold |
|---|---|---|
| Training Code | Fully available | Not released |
| Training Data | OpenProteinSet (public) | Not released |
| Framework | PyTorch | JAX |
| Commercial License (v3) | Apache 2.0 (unrestricted) | Limited academic use |
| Custom Model Training | Supported | Not available |
| Inference Speed | 3-5x faster | Baseline |
PyTorch implementation enables compatibility with DeepSpeed for distributed training and integration with existing machine learning pipelines. Pharmaceutical companies leverage custom training capabilities to fine-tune predictions for proprietary chemical spaces.
OpenFold Pharmaceutical Applications
Major pharmaceutical companies integrated OpenFold into drug discovery workflows for therapeutic development and molecular optimization. Novo Nordisk applies internal data fine-tuning for therapeutic discovery pipelines.
Outpace Bio utilizes OpenFold for cell therapy development and molecular circuit engineering. Bayer Crop Science employs the platform for agricultural biotechnology and plant protein modeling applications.
| Organization | Application Area |
|---|---|
| Novo Nordisk | Therapeutic Discovery Pipelines |
| Outpace Bio | Cell Therapy Development |
| Bayer Crop Science | Agricultural Biotechnology |
| Cyrus Biotechnology | Enzyme-Based Drug Design |
| Astex Pharmaceuticals | Structure-Based Drug Design |
| SandboxAQ | Quantum-Aided Drug Discovery |
Cyrus Biotechnology focuses on enzyme-based drug design for autoimmune disease treatments. Astex Pharmaceuticals employs OpenFold for small molecule therapeutics and structure-based drug design.
SandboxAQ integrates OpenFold with quantum simulation technologies for advanced drug discovery workflows. OpenFold3 co-folding capabilities enable prediction of protein structures bound to drug molecules across pharmaceutical research pipelines.
OpenFold Development Timeline
The OpenFold Consortium launched in February 2022, establishing the foundation for open-source protein structure prediction development. OpenProteinSet released in August 2023, providing public access to pre-computed multiple sequence alignments.
The consortium published research in Nature Methods in May 2024, validating OpenFold’s accuracy against CASP15 benchmarks. Six new members joined in August 2024, followed by eight additional members in April 2025.
| Milestone | Date |
|---|---|
| OpenFold Consortium Founded | February 2022 |
| OpenProteinSet Released | August 2023 |
| Nature Methods Publication | May 2024 |
| Six New Members Added | August 2024 |
| Eight New Members Added | April 2025 |
| OpenFold3-Preview Released | October 2025 |
OpenFold3-preview launched in October 2025, introducing protein-ligand and protein-nucleic acid complex prediction capabilities. The release marked the platform’s expansion into co-folding applications critical for pharmaceutical drug discovery.
FAQ
How many members are in the OpenFold Consortium?
The OpenFold Consortium has 24 member organizations as of 2025, including six global pharmaceutical firms, three technology companies (AWS, Microsoft, NVIDIA), and two academic partnerships. Over 40 institutions contributed to OpenFold3 development.
How does OpenFold compare to AlphaFold in speed?
OpenFold achieves 3-5x faster inference speeds compared to AlphaFold2 baseline performance. Custom kernel implementations provide an additional 4x inference speedup for production deployments while maintaining comparable accuracy across CASP15 benchmarks.
What data is available in OpenProteinSet?
OpenProteinSet contains 16 million multiple sequence alignments with an average depth of 940 sequences per alignment. The dataset includes 4.5 million protein sequences shared via AWS RODA and 270,000 filtered Uniclust30 MSAs for self-distillation training.
Is OpenFold available for commercial use?
OpenFold3 operates under Apache 2.0 licensing, permitting unrestricted academic and commercial applications. This contrasts with AlphaFold3, which limits usage to academic research. The license enables pharmaceutical companies to integrate OpenFold into proprietary drug discovery workflows.
What accuracy does OpenFold achieve?
OpenFold recorded a mean GDT-TS score of 68.6-78.8 across 90 CASP15 evaluated domains, matching AlphaFold2’s 69.7-79.2 performance. OpenFold matched or exceeded AlphaFold2 on 50% of targets, demonstrating performance parity between implementations.

