PULSAR: a Foundation Model for Multi-scale and Multicellular Biology

PULSAR (Patient Understanding Leveraging Single-cell universAl Representation) is a multi-scale, multicellular foundation model that integrates information from genes to cells to multicellular systems. PULSAR bridges massive scRNA-seq datasets with clinical phenotypes for human peripheral immunity, trained via self-supervision on 36.2 million cells from 6,807 donors.

| Preprint |

Installation

We use uv to manage virtual environments and dependencies. Refer to the uv documentation to install uv.
Then use uv to create a virtual environment and install dependencies:

uv sync # create venv
uv pip install -e . # installs the package in editable mode

Usage

Refer to Examples section below for example notebooks demonstrating how to use PULSAR for various downstream tasks. In brief, you can load a pre-trained PULSAR model as follows:

from pulsar.model import PULSAR
model = PULSAR.from_pretrained("KuanP/PULSAR-pbmc")

We also provide utilities to extract donor embeddings from single-cell data in H5AD format, as follows:

from pulsar.utils import extract_donor_embeddings_from_h5ad
donor_embeddings = extract_donor_embeddings_from_h5ad(
    h5ad_path="path_to_your_h5ad_file.h5ad",
    model=model,
    donor_id_key="donor_id_column_in_obs",
)

This function will return a dictionary mapping donor IDs to their corresponding PULSAR embeddings. Column name in .obs containing donor IDs can be specified via donor_id_key.

Note that this function requires you to obtain cell-level embeddings for H5AD first in .obsm, a pipeline line for extracting UCE embedding can be found here.

Examples

Notebook	Description
Zero-shot age regression	Demonstrates age regression using zero-shot PULSAR embeddings with subsampled OneK1K dataset.
Zero-shot disease classification	Demonstrates lupus disease classification using zero-shot PULSAR embeddings (using subsampled Lupus dataset).
Searching donor embeddings	Demonstrates searching donors using PULSAR embeddings against `DONORxEMBED`.

Data used for the examples can be downloaded from here.

Model weights

Model	Description	Parameters	Context Length	Download
`PULSAR-pbmc`	Continually pre-trained on 8.8M PBMC data from 2,588 donors, best for PBMC-related tasks	87.4M	1024	🤗 HuggingFace
`PULSAR-aligned`	Aligned version of PULSAR-pbmc using disease labels	87.4M	1024	🤗 HuggingFace

Model weights are directly loadable via the transformers library, for example:

from pulsar.model import PULSAR
model = PULSAR.from_pretrained("KuanP/PULSAR-pbmc")

DONORxEMBED Datasets

We release the DONORxEMBED datasets for both zero-shot and aligned PULSAR, you can find example for loading the datasets here.

Dataset	Download
PULSAR_DONORxEMBED_zero_shot	🤗 HuggingFace
PULSAR_DONORxEMBED_aligned	🤗 HuggingFace

Acknowledgements

We sincerely thank the authors of following open-source projects:

Cite Us

@article {pang2025pulsar,
	author = {Pang, Kuan and Rosen, Yanay and Kedzierska, Kasia and He, Ziyuan and Rajagopal, Abhe and Gustafson, Claire E and Huynh, Grace and Leskovec, Jure},
	title = {PULSAR: a Foundation Model for Multi-scale and Multicellular Biology},
	elocation-id = {2025.11.24.685470},
	year = {2025},
	doi = {10.1101/2025.11.24.685470},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2025/11/26/2025.11.24.685470},
	eprint = {https://www.biorxiv.org/content/early/2025/11/26/2025.11.24.685470.full.pdf},
	journal = {bioRxiv}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
examples		examples
src/pulsar		src/pulsar
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PULSAR: a Foundation Model for Multi-scale and Multicellular Biology

Installation

Usage

Examples

Model weights

DONORxEMBED Datasets

Acknowledgements

Cite Us

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

PULSAR: a Foundation Model for Multi-scale and Multicellular Biology

Installation

Usage

Examples

Model weights

DONORxEMBED Datasets

Acknowledgements

Cite Us

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages