ESM-2
Meta AI (FAIR)
State-of-the-art protein language model (up to 15B parameters) trained on 250M protein sequences. Provides rich per-residue and per-sequence embeddings used across structure prediction, function annotation, and variant effect scoring.
Best For
Protein embeddings for downstream ML; variant effect prediction; fast structure prediction via ESMFold
License
Open Source (MIT)
Strengths
- +MIT license
- +Multiple model sizes (8M to 15B)
- +Rich embeddings
- +Basis for ESMFold
Limitations
- −Sequence-only (no structural input)
- −Large models require significant GPU memory
- −Static pre-training (no fine-tuning API)
R&D Pipeline Coverage
Related Tools
ESMFold
Meta AI (FAIR)
Single-sequence protein structure prediction using the ESM-2 protein language model (15B parameters). No MSA required — fast inference directly from sequence.
ProGen2
Salesforce Research
Autoregressive protein language model (up to 6.4B parameters) for controllable protein sequence generation. Generates functional proteins conditioned on protein family or function tags.
ESM3
EvolutionaryScale
Multimodal protein language model that simultaneously reasons over sequence, structure, and function. Can generate novel proteins by prompting with partial information.
More in Protein LMs
ProGen2
Salesforce Research
Autoregressive protein language model (up to 6.4B parameters) for controllable protein sequence generation. Generates functional proteins conditioned on protein family or function tags.
EvoDiff
Microsoft Research
Discrete diffusion framework for controllable protein generation in sequence space. Combines evolutionary-scale data with diffusion model conditioning for generating diverse, structurally plausible proteins.
OpenCRISPR-1
Profluent Bio
First open-source AI-generated CRISPR-Cas9 gene editor. Protein language model-designed Cas9 variant with comparable editing efficiency to SpCas9 but novel sequence. Demonstrates LLM-driven protein engineering at scale.
Stay updated on ESM-2
Weekly newsletter covering AI tool releases, benchmarks, and what practitioners actually use.