PULSAR (Patient Understanding Leveraging Single-cell universAl Representation) is a multi-scale, multicellular foundation model that integrates information from genes to cells to multicellular systems. PULSAR bridges massive scRNA-seq datasets with clinical phenotypes for human peripheral immunity, trained via self-supervision on 36.2 million cells from 6,807 donors.
| Preprint |
- We use
uvto manage virtual environments and dependencies. Refer to the uv documentation to install uv. - Then use
uvto create a virtual environment and install dependencies:
uv sync # create venv
uv pip install -e . # installs the package in editable modeRefer to Examples section below for example notebooks demonstrating how to use PULSAR for various downstream tasks. In brief, you can load a pre-trained PULSAR model as follows:
from pulsar.model import PULSAR
model = PULSAR.from_pretrained("KuanP/PULSAR-pbmc")We also provide utilities to extract donor embeddings from single-cell data in H5AD format, as follows:
from pulsar.utils import extract_donor_embeddings_from_h5ad
donor_embeddings = extract_donor_embeddings_from_h5ad(
h5ad_path="path_to_your_h5ad_file.h5ad",
model=model,
donor_id_key="donor_id_column_in_obs",
)This function will return a dictionary mapping donor IDs to their corresponding PULSAR embeddings. Column name in .obs containing donor IDs can be specified via donor_id_key.
Note that this function requires you to obtain cell-level embeddings for H5AD first in .obsm, a pipeline line for extracting UCE embedding can be found here.
| Notebook | Description |
|---|---|
| Zero-shot age regression | Demonstrates age regression using zero-shot PULSAR embeddings with subsampled OneK1K dataset. |
| Zero-shot disease classification | Demonstrates lupus disease classification using zero-shot PULSAR embeddings (using subsampled Lupus dataset). |
| Searching donor embeddings | Demonstrates searching donors using PULSAR embeddings against DONORxEMBED. |
Data used for the examples can be downloaded from here.
| Model | Description | Parameters | Context Length | Download |
|---|---|---|---|---|
PULSAR-pbmc |
Continually pre-trained on 8.8M PBMC data from 2,588 donors, best for PBMC-related tasks | 87.4M | 1024 | 🤗 HuggingFace |
PULSAR-aligned |
Aligned version of PULSAR-pbmc using disease labels | 87.4M | 1024 | 🤗 HuggingFace |
Model weights are directly loadable via the transformers library, for example:
from pulsar.model import PULSAR
model = PULSAR.from_pretrained("KuanP/PULSAR-pbmc")We release the DONORxEMBED datasets for both zero-shot and aligned PULSAR, you can find example for loading the datasets here.
| Dataset | Download |
|---|---|
| PULSAR_DONORxEMBED_zero_shot | 🤗 HuggingFace |
| PULSAR_DONORxEMBED_aligned | 🤗 HuggingFace |
We sincerely thank the authors of following open-source projects:
@article {pang2025pulsar,
author = {Pang, Kuan and Rosen, Yanay and Kedzierska, Kasia and He, Ziyuan and Rajagopal, Abhe and Gustafson, Claire E and Huynh, Grace and Leskovec, Jure},
title = {PULSAR: a Foundation Model for Multi-scale and Multicellular Biology},
elocation-id = {2025.11.24.685470},
year = {2025},
doi = {10.1101/2025.11.24.685470},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2025/11/26/2025.11.24.685470},
eprint = {https://www.biorxiv.org/content/early/2025/11/26/2025.11.24.685470.full.pdf},
journal = {bioRxiv}
}
