Skip to content

snap-stanford/PULSAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PULSAR: a Foundation Model for Multi-scale and Multicellular Biology

PULSAR (Patient Understanding Leveraging Single-cell universAl Representation) is a multi-scale, multicellular foundation model that integrates information from genes to cells to multicellular systems. PULSAR bridges massive scRNA-seq datasets with clinical phenotypes for human peripheral immunity, trained via self-supervision on 36.2 million cells from 6,807 donors.

| Preprint |

Installation

  • We use uv to manage virtual environments and dependencies. Refer to the uv documentation to install uv.
  • Then use uv to create a virtual environment and install dependencies:
uv sync # create venv
uv pip install -e . # installs the package in editable mode

Usage

Refer to Examples section below for example notebooks demonstrating how to use PULSAR for various downstream tasks. In brief, you can load a pre-trained PULSAR model as follows:

from pulsar.model import PULSAR
model = PULSAR.from_pretrained("KuanP/PULSAR-pbmc")

We also provide utilities to extract donor embeddings from single-cell data in H5AD format, as follows:

from pulsar.utils import extract_donor_embeddings_from_h5ad
donor_embeddings = extract_donor_embeddings_from_h5ad(
    h5ad_path="path_to_your_h5ad_file.h5ad",
    model=model,
    donor_id_key="donor_id_column_in_obs",
)

This function will return a dictionary mapping donor IDs to their corresponding PULSAR embeddings. Column name in .obs containing donor IDs can be specified via donor_id_key.

Note that this function requires you to obtain cell-level embeddings for H5AD first in .obsm, a pipeline line for extracting UCE embedding can be found here.

Examples

Notebook Description
Zero-shot age regression Demonstrates age regression using zero-shot PULSAR embeddings with subsampled OneK1K dataset.
Zero-shot disease classification Demonstrates lupus disease classification using zero-shot PULSAR embeddings (using subsampled Lupus dataset).
Searching donor embeddings Demonstrates searching donors using PULSAR embeddings against DONORxEMBED.

Data used for the examples can be downloaded from here.

Model weights

Model Description Parameters Context Length Download
PULSAR-pbmc Continually pre-trained on 8.8M PBMC data from 2,588 donors, best for PBMC-related tasks 87.4M 1024 🤗 HuggingFace
PULSAR-aligned Aligned version of PULSAR-pbmc using disease labels 87.4M 1024 🤗 HuggingFace

Model weights are directly loadable via the transformers library, for example:

from pulsar.model import PULSAR
model = PULSAR.from_pretrained("KuanP/PULSAR-pbmc")

DONORxEMBED Datasets

We release the DONORxEMBED datasets for both zero-shot and aligned PULSAR, you can find example for loading the datasets here.

Dataset Download
PULSAR_DONORxEMBED_zero_shot 🤗 HuggingFace
PULSAR_DONORxEMBED_aligned 🤗 HuggingFace

Acknowledgements

We sincerely thank the authors of following open-source projects:

Cite Us

@article {pang2025pulsar,
	author = {Pang, Kuan and Rosen, Yanay and Kedzierska, Kasia and He, Ziyuan and Rajagopal, Abhe and Gustafson, Claire E and Huynh, Grace and Leskovec, Jure},
	title = {PULSAR: a Foundation Model for Multi-scale and Multicellular Biology},
	elocation-id = {2025.11.24.685470},
	year = {2025},
	doi = {10.1101/2025.11.24.685470},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2025/11/26/2025.11.24.685470},
	eprint = {https://www.biorxiv.org/content/early/2025/11/26/2025.11.24.685470.full.pdf},
	journal = {bioRxiv}
}

About

PULSAR: a Foundation Model for Multi-scale and Multicellular Biology

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages