Single-Cell Comparison
scGPT vs CellTypist vs scBERT: Single-Cell Foundation Models (2026)
Last updated: 2026-04-17
Single-cell RNA sequencing generates massive datasets requiring automated cell type annotation, and three distinct approaches have emerged. scGPT (University of Toronto, Nature Methods 2024) is a generative pre-trained transformer for single-cell biology — pre-trained on 33M cells, it learns gene-gene relationships and cell representations for annotation, perturbation prediction, and gene network inference. CellTypist (Wellcome Sanger Institute) takes the classical ML approach — logistic regression models trained on curated reference atlases, optimized for speed and interpretability. scBERT applies BERT-style masked language modeling to gene expression profiles, treating each cell as a 'sentence' of gene tokens. The question isn't which is 'best' — it's which matches your data, compute, and expertise.
scGPT
Bo Wang Lab (University of Toronto)
CellTypist
Teichmann Lab (Wellcome Sanger Institute)
Head-to-Head
Structured comparison across key dimensions.
| Dimension | scGPT | CellTypist | |
|---|---|---|---|
| Architecture | Generative pre-trained transformer; gene tokens with expression binning; attention masking | Logistic regression (SGD-optimized) on reference atlas features; classical ML | BERT-style masked language model; gene tokens as vocabulary; performer attention |
| Pre-training data | 33M cells from CellxGene census (blood, brain dominate ~71%) | No pre-training — trains directly on curated reference atlases | ~1M cells from PanglaoDB (broad tissue coverage) |
| Cell type annotation accuracy | Strong after fine-tuning — 90%+ on PBMC benchmarks; drops on rare populations and out-of-distribution tissues | Very strong on tissues with good reference atlases; 85-95% on standard benchmarks; transparent confidence scores | Comparable to scGPT on balanced datasets; drops more sharply on rare cell types |
| Beyond annotation | Perturbation prediction, gene network inference, multi-batch integration, multi-omic integration | Cell type annotation only — focused tool; over-clustering resolution analysis | Cell type annotation; gene embeddings for downstream analysis |
| Compute requirements | GPU required for fine-tuning (1× A100 recommended); inference possible on smaller GPUs | CPU only — runs in seconds on laptop; no GPU needed | GPU required for fine-tuning; moderate size (~10M parameters) |
| Ease of use | Moderate — requires fine-tuning per dataset; hyperparameter tuning matters | Easy — pip install celltypist; pre-built models for major tissues; 3 lines of code | Moderate — requires fine-tuning; less documentation than scGPT |
| Interpretability | Attention maps and gene embeddings provide some interpretability; not directly transparent | Highly interpretable — feature coefficients show which genes drive each prediction | Limited — standard transformer attention; less interpretable than CellTypist |
| Tissue bias | Strong bias toward blood and brain (~71% of pre-training data); weaker on underrepresented tissues | Depends on available reference models — 30+ tissue models available; expandable | Broader pre-training tissue coverage than scGPT; still biased toward well-studied tissues |
| License | Open source (GitHub) | Open source (MIT-like); Wellcome Sanger Institute | Open source (GitHub) |
| Key limitation | Tissue bias in pre-training; GPU-dependent; fine-tuning overhead; recent evaluations question gains over non-pretrained baselines | Annotation only — no perturbation or multi-task capability; needs reference atlas per tissue | Less adopted than scGPT; weaker on rare cell types; requires fine-tuning without clear advantage over scGPT |
When to Use Each
scGPT
You need a single model for multiple tasks: cell type annotation, perturbation response prediction, gene regulatory network inference, and multi-batch integration. You have GPU access for fine-tuning. You're working with well-represented tissues (blood, brain, lung).
CellTypist
You need fast, interpretable cell type annotation with minimal setup. You're annotating well-characterized tissues with existing reference models. You want to run on CPU without deep learning infrastructure. You need transparent feature importance for each prediction.
Practitioner Verdict
Use CellTypist for fast, reliable cell type annotation when your tissue is well-represented in reference atlases — it's production-ready, interpretable, and needs no GPU. Use scGPT when you need a foundation model for multiple downstream tasks (annotation + perturbation + gene networks) and have GPU access for fine-tuning. Use scBERT as an alternative foundation model with a simpler architecture when you want transformer-based annotation without scGPT's complexity.
Stay updated on these tools
Weekly briefing on AI tool releases, benchmarks, and what works in drug discovery.