Generative Design Comparison
ESM-2 vs ProGen2 vs EvoDiff: Protein Language Models & Generation (2026)
Last updated: 2026-04-16
Protein language models are the foundation of modern computational biology. ESM-2 (Meta) provides rich learned representations used for everything from structure prediction to function annotation. ProGen2 (Salesforce) generates novel protein sequences autoregressively. EvoDiff (Microsoft) takes a different approach — discrete diffusion that enables order-agnostic and motif-conditioned generation. These tools serve fundamentally different purposes.
Head-to-Head
Structured comparison across key dimensions.
| Dimension | ESM-2 | ProGen2 | EvoDiff |
|---|---|---|---|
| Primary purpose | Protein representation learning (embeddings for downstream tasks) | Autoregressive protein sequence generation | Diffusion-based protein sequence generation with conditioning |
| Architecture | Masked language model (BERT-style transformer) | Autoregressive language model (GPT-style transformer) | Discrete diffusion model (order-agnostic autoregressive) |
| Model sizes | 8M to 15B parameters (650M most commonly used) | 151M to 6.4B parameters | ~640M parameters (OADM model) |
| Training data | UniRef50/90 (~250M sequences) | UniRef90, BFD, OAS (>1B sequences including metagenomics and immune repertoires) | UniRef50 + evolutionary alignments (OpenFold MSAs) |
| Can generate sequences? | Limited — masked token infilling only (not designed for generation) | Yes — strong autoregressive generation with family/taxonomy conditioning | Yes — diffusion-based with motif scaffolding and inpainting |
| Zero-shot prediction | Excellent — variant effect prediction, contact maps, secondary structure from embeddings alone | Log-likelihood scoring for fitness; less validated than ESM-2 for zero-shot tasks | Not designed for zero-shot prediction tasks |
| Structure prediction | Yes — ESMFold uses ESM-2 embeddings (single-sequence, no MSA needed) | No — sequence only | No — sequence only (validate with ESMFold/AF2 downstream) |
| License | MIT | MIT | MIT |
| On Platform | Partial (via ESMFold) | No | No |
| Key limitation | Not a generative model; masked infilling ≠ coherent full-sequence design | Left-to-right generation only; cannot condition on internal motifs or do inpainting | Smaller model; generated proteins need downstream structural validation; less adopted than ESM-2 |
When to Use Each
ESM-2
You need protein embeddings for downstream ML tasks. You want zero-shot variant effect prediction. You need structure prediction (ESMFold). You're building a classifier, regressor, or search system on top of protein representations.
ProGen2
You want to generate novel protein sequences. You need controllable generation conditioned on protein family, taxonomy, or function. You want the largest available autoregressive protein model (6.4B parameters).
EvoDiff
You need motif-conditioned protein generation (scaffold a sequence around a fixed motif). You want order-agnostic generation (not left-to-right). You need sequence inpainting or infilling capabilities.
Practitioner Verdict
Use ESM-2 when you need protein embeddings for downstream tasks (structure prediction, function classification, variant effect prediction) — it's the representation backbone of the field. Use ProGen2 for autoregressive protein sequence generation, especially when you want to control generation with taxonomic or functional conditioning. Use EvoDiff when you need diffusion-based generation with motif scaffolding or inpainting capabilities that autoregressive models can't do.
Stay updated on these tools
Weekly briefing on AI tool releases, benchmarks, and what works in drug discovery.