Intelligence
91 AI tools for drug discovery, curated and rated by practitioners. Filter by category, maturity, and availability.
Google DeepMind
Predicts single-chain and multimer protein 3D structures from amino acid sequence using MSA-based deep learning. Set the modern benchmark on CASP14.
Steinegger Lab (Seoul National University)
Wraps AlphaFold2 with MMseqs2-based MSA generation, making AF2 runs 40-60x faster. Accessible via Google Colab or local install.
MIT Jameel Clinic
First fully open-source model achieving AlphaFold3-level accuracy for joint structure prediction of proteins, nucleic acids, and small molecules.
MIT + Recursion Pharmaceuticals
Extends Boltz-1 to jointly predict 3D complex structure AND protein-ligand binding affinity. Approaches FEP accuracy at 1,000x lower compute cost.
Chai Discovery
Multi-modal foundation model for joint structure prediction of proteins, small molecules, DNA, RNA, and glycosylations. Performs well in single-sequence mode.
ByteDance Research
Fully open-source PyTorch reproduction of AlphaFold3 architecture. Protenix-v1 (Feb 2026) reported to outperform AF3 across diverse benchmarks.
Meta AI (FAIR)
Single-sequence protein structure prediction using the ESM-2 protein language model (15B parameters). No MSA required — fast inference directly from sequence.
HeliXon Protein
Single-sequence structure prediction using a protein language model plus geometry-inspired transformer. First MSA-free method to approach AF2 accuracy.
Google DeepMind / Isomorphic Labs
Joint structure prediction of proteins, DNA, RNA, small molecules, ions, and covalent modifications in a single diffusion-based model.
OpenFold Consortium
Fully open-source AF3-architecture co-folding model. Full-stack release includes training data, weights, code, and evaluation scripts.
IntelliGen AI
Controllable open-source foundation model for biomolecular structure prediction. Claims to surpass AF3 on FoldBench for antibody-antigen co-folding.
MIT CSAIL (Corso et al.)
Diffusion-based generative model that treats docking as a generative problem over ligand poses. No pre-specified binding pocket needed.
Koes Lab (University of Pittsburgh)
AutoDock Vina-based docking engine augmented with a 3D CNN scoring function. Uses Vina for sampling, CNN for scoring and re-ranking.
Morehead, Cheng Lab (University of Missouri)
Geometric flow matching model that maps apo protein structures to bound complexes for multiple ligands simultaneously. Outputs confidence scores and affinity estimates.
The Scripps Research Institute
Classical rigid receptor, flexible ligand docking using empirical and knowledge-based scoring. The most widely used open-source docking tool.
DP Technology
GPU-accelerated molecular docking achieving >2000x speedup over CPU Vina. Enables ultra-large virtual screening of billions of compounds.
Schrödinger
High-precision grid-based docking with hierarchical filtering. Glide WS (2025) explicitly models water molecules during docking.
Cambridge Crystallographic Data Centre (CCDC)
Genetic algorithm-based docking supporting full ligand flexibility and partial protein side-chain flexibility. Four scoring functions available.
Aspuru-Guzik Group
Active-learning framework using QSAR models to predict docking scores, enabling 50-100x acceleration of large-library virtual screening.
Greenstone Biosciences / Stanford
Predicts 41 ADMET endpoints using a Chemprop-RDKit GNN. Held the highest average rank on the TDC ADMET Leaderboard at time of publication.
SCBDD Group, Central South University
Comprehensive ADMET prediction platform covering 119 endpoints with uncertainty estimates. Trained on >400,000 curated entries.
Swiss Institute of Bioinformatics (SIB)
Predicts pharmacokinetics, drug-likeness, and medicinal chemistry friendliness. Known for the BOILED-Egg visualization and Bioavailability Radar.
BioSig Lab (University of Queensland)
Predicts 28 PK and toxicity properties using graph-based molecular signatures. Long-standing academic reference tool.
BioSig Lab (University of Queensland)
Deep learning successor to pkCSM. Predicts 73 endpoints using GNNs with molecular optimization and interpretability outputs.
Charité Berlin
Comprehensive toxicity prediction with 61 models across acute toxicity, organ toxicity, mutagenicity, carcinogenicity, and toxicological pathway activity.
MIT (Barzilay, Coley et al.)
Open-source D-MPNN library for molecular property prediction. The architecture underlying ADMET-AI and ADMETlab 3.0. Used in Halicin antibiotic discovery.
Simulations Plus
Commercial platform predicting 175+ ADMET properties with integrated PBPK (GastroPlus) and generative drug design (AIDD module).
Schrödinger
Predicts PK and physicochemical properties based on full 3D molecular structure. Part of the Schrödinger Drug Discovery Platform.
Baker Lab / IPD (University of Washington)
Diffusion-based generative model for de novo protein backbone design. Generates novel protein structures conditioned on binding targets, symmetry, or functional sites.
Baker Lab / IPD (University of Washington)
Successor to RFdiffusion using flow matching. Designs enzymes directly from active site geometry (theozyme) specifications.
Baker Lab / IPD (University of Washington)
Inverse folding model: generates amino acid sequences predicted to fold into a target 3D backbone structure. Standard component of all modern protein design pipelines.
Baker Lab / IPD (University of Washington)
Extension of ProteinMPNN that conditions sequence design on bound ligands, small molecules, metals, and nucleotides.
PocketFlow Team
Flow-based generative model that creates novel small molecule ligands for a target binding pocket. Generates hundreds of candidates in minutes.
AstraZeneca Molecular AI
RL + transformer platform for de novo small molecule design. Supports scaffold decoration, R-group replacement, linker design, and multi-parameter optimization.
Valence Labs
GPT-style model trained on SAFE (fragment-based) molecular representation. Enables fragment-constrained design including scaffold decoration and linker generation.
Insilico Medicine
Commercial generative chemistry platform using an ensemble of deep learning architectures. Part of the Pharma.AI platform. Has molecules in Phase I/II clinical trials.
Generate:Biomedicines
Programmable generative model for protein and protein complex design using diffusion with conditioning on geometry, symmetry, and functional annotations.
EvolutionaryScale
Multimodal protein language model that simultaneously reasons over sequence, structure, and function. Can generate novel proteins by prompting with partial information.
Luo et al. (NeurIPS 2022)
Diffusion-based generative model that jointly designs antibody CDR sequences and 3D structures conditioned on antigen structure.
Baker Lab / IPD (University of Washington)
RFdiffusion fine-tuned for de novo antibody design. Generates VHHs, scFvs, and full antibodies targeting user-specified epitopes. Experimentally validated with cryo-EM.
OPIG (Oxford)
Antibody-specific inverse folding model fine-tuned from ESM-IF1. Designs sequences predicted to maintain structural fold given an antibody backbone.
OPIG (Oxford)
Suite for predicting 3D structures of antibodies (ABodyBuilder), nanobodies (NanoBodyBuilder2), and TCRs (TCRBuilder2) from sequence.
Johns Hopkins / Profluent Bio
Fast deep learning model for antibody structure prediction from sequence alone. Processes paired heavy/light chain inputs.
OPIG (Oxford)
Antibody-specific language model for sequence restoration, per-residue scoring, and embedding. Reduces germline bias from original AbLang.
Meta AI (FAIR)
Structure-conditioned inverse folding model: given a protein backbone, predicts sequences likely to fold into it. General-purpose (not antibody-specific).
EPFL / MIT (Pacesa, Ovchinnikov, Correia)
One-shot automated pipeline for de novo protein binder design. Backpropagates through AlphaFold2 to hallucinate binders. 10-100% experimental success rates.
OPIG (Oxford)
Sequence annotation tool for numbering antibody and TCR variable domains according to standard schemes (IMGT, Kabat, Chothia, AHo).
CERTH
AI-driven prediction of antibody-antigen binding sites (paratope and epitope prediction) from structure.
GROMACS Consortium (KTH, Max Planck, et al.)
High-performance all-atom and coarse-grained MD engine. GROMACS 2026 added native NNP/MM support for hybrid ML-classical simulations.
Stanford / OpenMM Community
Python-first MD framework with native ML potential API (openmm-ml). Wraps MACE, NequIP, AceFF, and other ML force fields directly.
AMBER Consortium (UCSF et al.)
MD suite with best-in-class GPU acceleration (pmemd.cuda) and strong force field ecosystem. Now includes NNP integration via DeePMD-GNN.
D.E. Shaw Research / Schrödinger
GPU-accelerated MD engine integrated with Schrödinger's platform. FEP+ is the industry gold standard for relative binding free energy predictions in lead optimization.
Open Free Energy Consortium (15+ pharma companies)
Open-source RBFE framework. 2025 benchmark across 1,700+ ligands from 15 pharma companies showed out-of-the-box accuracy approaching FEP+.
Cambridge (Csányi Lab)
Equivariant ML force field for organic molecules covering H, C, N, O, F, P, S, Cl, Br, I (~90% of drug-like space). Near-DFT accuracy for torsion profiles.
EBI / Genentech / GSK / MSD / Pfizer / Sanofi / Wellcome Sanger
Integrates 23+ public data sources to systematically score and rank target-disease associations. Provides target prioritization based on clinical precedence and tractability.
IMIM / DisGeNET (commercial entity)
Comprehensive gene-disease and variant-disease association database. >2M GDAs, >4M VDAs, >20M DDAs. Integrates curated repositories, GWAS, animal models, and NLP-extracted evidence.
EMBL / SIB / CPR
Functional protein-protein association networks across 12,535 organisms. v12.5 added a regulatory network layer capturing directionality via LLM-parsed literature.
EMBL-EBI
Manually curated bioactivity database. 5.4M+ bioactivity measurements for 1M+ compounds against 5,200+ protein targets from peer-reviewed literature.
OMx Personal Health Analytics
Gold standard drug knowledge resource. 4,563 FDA-approved drugs, 6,231 investigational drugs, 1.4M drug-drug interactions, comprehensive target annotations.
NCBI / NIH
AI-powered literature resource providing automated NER and relation annotations across ~36M PubMed abstracts and ~6M PMC full-text articles. Updated weekly.
Gyori Lab, Harvard Medical School
Automated knowledge assembly system that reads NLP systems and databases, standardizes causal statements, and assembles them into executable mechanistic models.
Microsoft Research
BERT encoder pre-trained exclusively on PubMed abstracts with domain-specific vocabulary. Baseline for biomedical NER, relation extraction, and classification.
Monarch Consortium (EMBL-EBI et al.)
Integrates and cross-species aligns phenotype-gene-disease data from 33 sources. Enables phenotype-driven gene discovery and cross-species model comparison.
Zitnik Lab, Harvard Medical School
Precision Medicine Knowledge Graph integrating 20 sources across 10 biological scales. 17,080 diseases with 4M+ relationships including drug indication and off-label use edges.
KNU LCBC (Kyungpook National University)
Single-step retrosynthesis prediction using fragment-based tokenization of atomic environments and a Transformer architecture. Mimics chemical reasoning by learning changes in atom environments between products and reactants.
AstraZeneca Molecular AI
Multi-step retrosynthetic planning tool using Monte Carlo tree search guided by neural network policies. Recursively breaks down target molecules into purchasable precursors. Production-used at AstraZeneca.
DeepGraphLearning (Mila / Université de Montréal)
PyTorch-based ML platform for drug discovery covering graph neural networks, geometric deep learning, knowledge graphs, generative models, and retrosynthesis. Provides unified API for property prediction, generation, and synthesis planning.
Huang et al. (Harvard / MIT)
Deep learning toolkit for drug-target interaction (DTI) prediction, compound property prediction, protein-protein interaction prediction, and drug-drug interaction prediction. Supports 15+ encoding methods and 5+ model architectures.
Bo Wang Lab (University of Toronto)
Foundation model for single-cell multi-omics built on generative pre-training of ~33M cells. Fine-tunes to SOTA on cell type annotation, multi-batch integration, perturbation prediction, and gene network inference.
Teichmann Lab (Wellcome Sanger Institute)
Automated cell type annotation tool for scRNA-seq data using logistic regression models trained on curated cross-tissue immune cell atlases. Provides a growing encyclopedia of pre-trained cell type models.
Meta AI (FAIR)
State-of-the-art protein language model (up to 15B parameters) trained on 250M protein sequences. Provides rich per-residue and per-sequence embeddings used across structure prediction, function annotation, and variant effect scoring.
Salesforce Research
Autoregressive protein language model (up to 6.4B parameters) for controllable protein sequence generation. Generates functional proteins conditioned on protein family or function tags.
Microsoft Research
Discrete diffusion framework for controllable protein generation in sequence space. Combines evolutionary-scale data with diffusion model conditioning for generating diverse, structurally plausible proteins.
Insilico Medicine
AI-driven target identification and biomarker discovery platform. Processes omics data, text mining, and knowledge graphs to prioritize novel therapeutic targets. Core component of Insilico's Pharma.AI suite alongside Chemistry42 and inClinico.
Zitnik Lab, Harvard Medical School
Coordinated initiative providing AI-ready datasets, curated benchmarks, and leaderboards across therapeutic modalities and discovery stages. Covers 22 tasks across single-instance, multi-instance, and generation problems.
OpenFold Consortium (Columbia, NVIDIA, SandboxAQ et al.)
Trainable, memory-efficient, GPU-friendly PyTorch reproduction of AlphaFold2. Includes full training code and data, enabling retraining and fine-tuning on custom datasets. Demonstrated AF2 reproducibility from scratch.
Baker Lab / IPD (University of Washington)
Extension of RFdiffusion that jointly diffuses over protein backbone AND small molecule ligands, enabling de novo design of proteins that bind specific small molecules like heme or digoxigenin.
Baker Lab / IPD (University of Washington)
Third-generation diffusion model for protein design unifying backbone generation, sequence design, and all-atom refinement. Open-sourced Dec 2025 with full training code.
Qiao et al. (Caltech / NVIDIA)
Multi-scale deep generative model for state-specific protein-ligand complex structure prediction. Predicts both protein conformational change and ligand binding pose simultaneously from sequence.
Bryant et al. (FU Berlin / Noé Lab)
Unified molecular model predicting protein-ligand complex structures directly from sequence information. Combines MSA-based protein features with ligand graph representations.
Schneuing et al. (Cambridge / Microsoft / VantAI)
Equivariant diffusion model for structure-based drug design that generates novel 3D molecules directly inside protein binding pockets. Published in Nature Computational Science 2024.
DP Technology / DeepModeling
Universal 3D molecular pretraining framework for property prediction, conformation generation, and docking. Uni-Mol2 (2024) scales to 1.1B parameters trained on 800M conformations.
Seo & Kim (KAIST)
Deep learning-guided pharmacophore modeling for ultra-large-scale virtual screening. Derives protein-based pharmacophore models automatically and scores compounds at extreme throughput.
NVIDIA
Molecular generation model using Mutual Information Machine with a Perceiver encoder. Maps molecules into a smooth latent space enabling controlled interpolation and optimization.
NVIDIA
End-to-end AI platform for drug discovery providing GPU-accelerated NIMs (NVIDIA Inference Microservices) spanning protein structure, molecular generation, docking, and property prediction. Includes ESMFold, DiffDock, MolMIM, and 25+ healthcare NIMs.
Baidu / PaddlePaddle
PaddlePaddle-based reproduction of AlphaFold3 for biomolecular structure prediction covering proteins, nucleic acids, small molecules, and ions. Open-sourced Aug 2024 with web server.
StoneWise AI Drug Design
Pocket-based 3D molecule generation combining language model token prediction with geometric deep learning for 3D coordinate generation. Published in Nature Machine Intelligence 2024.
Peng et al. (Peking University)
Efficient 3D molecular generation conditioned on protein binding pockets using equivariant graph neural networks with autoregressive atom placement.
Theodoris Lab (Harvard / MIT)
Transformer-based foundation model pre-trained on ~30M single-cell transcriptomes. Learns context-dependent gene network dynamics and transfers to diverse downstream tasks including disease modeling and therapeutic target prioritization.
Cambridge (Csányi Lab)
Universal foundation model for atomistic simulations covering 89 elements. Pre-trained on the Materials Project dataset, generalizes across organic molecules, inorganic crystals, and interfaces without fine-tuning.
Profluent Bio
First open-source AI-generated CRISPR-Cas9 gene editor. Protein language model-designed Cas9 variant with comparable editing efficiency to SpCas9 but novel sequence. Demonstrates LLM-driven protein engineering at scale.