Single-Cell Comparison

scGPT vs CellTypist vs scBERT: Single-Cell Foundation Models (2026)

Last updated: 2026-04-17

Single-cell RNA sequencing generates massive datasets requiring automated cell type annotation, and three distinct approaches have emerged. scGPT (University of Toronto, Nature Methods 2024) is a generative pre-trained transformer for single-cell biology — pre-trained on 33M cells, it learns gene-gene relationships and cell representations for annotation, perturbation prediction, and gene network inference. CellTypist (Wellcome Sanger Institute) takes the classical ML approach — logistic regression models trained on curated reference atlases, optimized for speed and interpretability. scBERT applies BERT-style masked language modeling to gene expression profiles, treating each cell as a 'sentence' of gene tokens. The question isn't which is 'best' — it's which matches your data, compute, and expertise.

scGPT

Bo Wang Lab (University of Toronto)

Research+

CellTypist

Teichmann Lab (Wellcome Sanger Institute)

Production

Head-to-Head

Structured comparison across key dimensions.

Dimension	scGPT	CellTypist
Architecture	Generative pre-trained transformer; gene tokens with expression binning; attention masking	Logistic regression (SGD-optimized) on reference atlas features; classical ML	BERT-style masked language model; gene tokens as vocabulary; performer attention
Pre-training data	33M cells from CellxGene census (blood, brain dominate ~71%)	No pre-training — trains directly on curated reference atlases	~1M cells from PanglaoDB (broad tissue coverage)
Cell type annotation accuracy	Strong after fine-tuning — 90%+ on PBMC benchmarks; drops on rare populations and out-of-distribution tissues	Very strong on tissues with good reference atlases; 85-95% on standard benchmarks; transparent confidence scores	Comparable to scGPT on balanced datasets; drops more sharply on rare cell types
Beyond annotation	Perturbation prediction, gene network inference, multi-batch integration, multi-omic integration	Cell type annotation only — focused tool; over-clustering resolution analysis	Cell type annotation; gene embeddings for downstream analysis
Compute requirements	GPU required for fine-tuning (1× A100 recommended); inference possible on smaller GPUs	CPU only — runs in seconds on laptop; no GPU needed	GPU required for fine-tuning; moderate size (~10M parameters)
Ease of use	Moderate — requires fine-tuning per dataset; hyperparameter tuning matters	Easy — pip install celltypist; pre-built models for major tissues; 3 lines of code	Moderate — requires fine-tuning; less documentation than scGPT
Interpretability	Attention maps and gene embeddings provide some interpretability; not directly transparent	Highly interpretable — feature coefficients show which genes drive each prediction	Limited — standard transformer attention; less interpretable than CellTypist
Tissue bias	Strong bias toward blood and brain (~71% of pre-training data); weaker on underrepresented tissues	Depends on available reference models — 30+ tissue models available; expandable	Broader pre-training tissue coverage than scGPT; still biased toward well-studied tissues
License	Open source (GitHub)	Open source (MIT-like); Wellcome Sanger Institute	Open source (GitHub)
Key limitation	Tissue bias in pre-training; GPU-dependent; fine-tuning overhead; recent evaluations question gains over non-pretrained baselines	Annotation only — no perturbation or multi-task capability; needs reference atlas per tissue	Less adopted than scGPT; weaker on rare cell types; requires fine-tuning without clear advantage over scGPT

When to Use Each

scGPT

You need a single model for multiple tasks: cell type annotation, perturbation response prediction, gene regulatory network inference, and multi-batch integration. You have GPU access for fine-tuning. You're working with well-represented tissues (blood, brain, lung).

CellTypist

You need fast, interpretable cell type annotation with minimal setup. You're annotating well-characterized tissues with existing reference models. You want to run on CPU without deep learning infrastructure. You need transparent feature importance for each prediction.

Practitioner Verdict

Use CellTypist for fast, reliable cell type annotation when your tissue is well-represented in reference atlases — it's production-ready, interpretable, and needs no GPU. Use scGPT when you need a foundation model for multiple downstream tasks (annotation + perturbation + gene networks) and have GPU access for fine-tuning. Use scBERT as an alternative foundation model with a simpler architecture when you want transformer-based annotation without scGPT's complexity.

Stay updated on these tools

Weekly briefing on AI tool releases, benchmarks, and what works in drug discovery.

scGPT vs CellTypist vs scBERT: Single-Cell Foundation Models (2026)

Last updated: 2026-04-17

Head-to-Head

Structured comparison across key dimensions.

Dimension	scGPT	CellTypist
Architecture	Generative pre-trained transformer; gene tokens with expression binning; attention masking	Logistic regression (SGD-optimized) on reference atlas features; classical ML	BERT-style masked language model; gene tokens as vocabulary; performer attention
Pre-training data	33M cells from CellxGene census (blood, brain dominate ~71%)	No pre-training — trains directly on curated reference atlases	~1M cells from PanglaoDB (broad tissue coverage)
Cell type annotation accuracy	Strong after fine-tuning — 90%+ on PBMC benchmarks; drops on rare populations and out-of-distribution tissues	Very strong on tissues with good reference atlases; 85-95% on standard benchmarks; transparent confidence scores	Comparable to scGPT on balanced datasets; drops more sharply on rare cell types
Beyond annotation	Perturbation prediction, gene network inference, multi-batch integration, multi-omic integration	Cell type annotation only — focused tool; over-clustering resolution analysis	Cell type annotation; gene embeddings for downstream analysis
Compute requirements	GPU required for fine-tuning (1× A100 recommended); inference possible on smaller GPUs	CPU only — runs in seconds on laptop; no GPU needed	GPU required for fine-tuning; moderate size (~10M parameters)
Ease of use	Moderate — requires fine-tuning per dataset; hyperparameter tuning matters	Easy — pip install celltypist; pre-built models for major tissues; 3 lines of code	Moderate — requires fine-tuning; less documentation than scGPT
Interpretability	Attention maps and gene embeddings provide some interpretability; not directly transparent	Highly interpretable — feature coefficients show which genes drive each prediction	Limited — standard transformer attention; less interpretable than CellTypist
Tissue bias	Strong bias toward blood and brain (~71% of pre-training data); weaker on underrepresented tissues	Depends on available reference models — 30+ tissue models available; expandable	Broader pre-training tissue coverage than scGPT; still biased toward well-studied tissues
License	Open source (GitHub)	Open source (MIT-like); Wellcome Sanger Institute	Open source (GitHub)
Key limitation	Tissue bias in pre-training; GPU-dependent; fine-tuning overhead; recent evaluations question gains over non-pretrained baselines	Annotation only — no perturbation or multi-task capability; needs reference atlas per tissue	Less adopted than scGPT; weaker on rare cell types; requires fine-tuning without clear advantage over scGPT

When to Use Each