ProGen2
Salesforce Research
Autoregressive protein language model (up to 6.4B parameters) for controllable protein sequence generation. Generates functional proteins conditioned on protein family or function tags.
Best For
Generating novel functional protein sequences conditioned on desired properties
License
Open Source (MIT)
Strengths
- +Controllable generation
- +Multiple model sizes
- +MIT license
- +Experimentally validated
Limitations
- −Autoregressive (slower than masked models)
- −No structure conditioning
- −Requires expertise to evaluate outputs
R&D Pipeline Coverage
Related Tools
ESM-2
Meta AI (FAIR)
State-of-the-art protein language model (up to 15B parameters) trained on 250M protein sequences. Provides rich per-residue and per-sequence embeddings used across structure prediction, function annotation, and variant effect scoring.
EvoDiff
Microsoft Research
Discrete diffusion framework for controllable protein generation in sequence space. Combines evolutionary-scale data with diffusion model conditioning for generating diverse, structurally plausible proteins.
ESM3
EvolutionaryScale
Multimodal protein language model that simultaneously reasons over sequence, structure, and function. Can generate novel proteins by prompting with partial information.
More in Protein LMs
ESM-2
Meta AI (FAIR)
State-of-the-art protein language model (up to 15B parameters) trained on 250M protein sequences. Provides rich per-residue and per-sequence embeddings used across structure prediction, function annotation, and variant effect scoring.
EvoDiff
Microsoft Research
Discrete diffusion framework for controllable protein generation in sequence space. Combines evolutionary-scale data with diffusion model conditioning for generating diverse, structurally plausible proteins.
OpenCRISPR-1
Profluent Bio
First open-source AI-generated CRISPR-Cas9 gene editor. Protein language model-designed Cas9 variant with comparable editing efficiency to SpCas9 but novel sequence. Demonstrates LLM-driven protein engineering at scale.
Stay updated on ProGen2
Weekly newsletter covering AI tool releases, benchmarks, and what practitioners actually use.