Therapeutics Data Commons
Zitnik Lab, Harvard Medical School
Coordinated initiative providing AI-ready datasets, curated benchmarks, and leaderboards across therapeutic modalities and discovery stages. Covers 22 tasks across single-instance, multi-instance, and generation problems.
Best For
Benchmarking ML models for drug discovery; standardized dataset access; ADMET leaderboards
License
Open Source (MIT)
Strengths
- +22 learning tasks
- +Standardized benchmarks
- +Active leaderboards
- +MIT license
- +Python API
Limitations
- −Benchmark ≠ real-world performance
- −Some datasets are small
- −Leaderboard overfitting risk
R&D Pipeline Coverage
Related Tools
ADMET-AI
Greenstone Biosciences / Stanford
Predicts 41 ADMET endpoints using a Chemprop-RDKit GNN. Held the highest average rank on the TDC ADMET Leaderboard at time of publication.
DeepPurpose
Huang et al. (Harvard / MIT)
Deep learning toolkit for drug-target interaction (DTI) prediction, compound property prediction, protein-protein interaction prediction, and drug-drug interaction prediction. Supports 15+ encoding methods and 5+ model architectures.
Chemprop v2
MIT (Barzilay, Coley et al.)
Open-source D-MPNN library for molecular property prediction. The architecture underlying ADMET-AI and ADMETlab 3.0. Used in Halicin antibiotic discovery.
More in Target Discovery
Open Targets Platform
EBI / Genentech / GSK / MSD / Pfizer / Sanofi / Wellcome Sanger
Integrates 23+ public data sources to systematically score and rank target-disease associations. Provides target prioritization based on clinical precedence and tractability.
DisGeNET
IMIM / DisGeNET (commercial entity)
Comprehensive gene-disease and variant-disease association database. >2M GDAs, >4M VDAs, >20M DDAs. Integrates curated repositories, GWAS, animal models, and NLP-extracted evidence.
STRING v12.5
EMBL / SIB / CPR
Functional protein-protein association networks across 12,535 organisms. v12.5 added a regulatory network layer capturing directionality via LLM-parsed literature.
Stay updated on Therapeutics Data Commons
Weekly newsletter covering AI tool releases, benchmarks, and what practitioners actually use.