A multimodal foundation model for cells.
LingoCell aligns gene expression, natural language and spatial context in a single 512-dimensional space — so that every cell can be embedded, retrieved by phrase, and described back in clear scientific English.
One model, three signals.
A 12-layer transformer reads the gene-token sequence; PubMedBERT reads the text; a spatial branch reads the neighbour graph and H&E patches. All three converge on the same unit-sphere.
Trained at scale, openly.
Pretraining used 11.5 M scRNA-seq cells from CellxGene and 1.57 M Visium HD cells across eleven tissues — all checkpoints, code and data manifests are public.
Useful out of the box.
The same checkpoint annotates cell types, recovers niche structure, retrieves cells by text, and writes a faithful one-paragraph description per cell.
Single-cell foundation models tend to be black boxes — they produce vectors, not explanations.
LingoCell pairs every embedding with a natural-language description grounded by cross-attention on the actual expressed genes — so a reviewer can verify the model's claim against the data it saw.
It is designed for biologists who want to interrogate the model, not just consume its predictions.
That is why every workflow on the server returns intermediate artefacts: gene attention, cosine-similarity histograms, niche-purity scores, neighbour composition.
FAIS · HGC
Developer Lab
LingoCell is developed at the Functional Analysis of Information Systems group, Human Genome Center, Institute of Medical Science, The University of Tokyo.
Visit the lab ↗- group
- FAIS · HGC
- institute
- Institute of Medical Science
- university
- The University of Tokyo
- cluster
- Miyabi · 4× H100
- license
- MIT · weights CC-BY-NC 4.0