Skip to content

Commit b9971d6

Browse files
committed
font; citation
1 parent 3245055 commit b9971d6

File tree

1 file changed

+32
-31
lines changed

1 file changed

+32
-31
lines changed

README.md

Lines changed: 32 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ See our official [project page](https://katarinayuan.github.io/PerturbDiff-Proje
2121
[Codex]: https://openai.com/zh-Hans-CN/index/introducing-codex/
2222
[PerturbDiff]: https://arxiv.org/html/2602.19685v1
2323

24-
## Overview
24+
# Overview
2525

2626
Building **virtual cells** that can accurately simulate perturbation responses is a core challenge in systems biology. In single-cell sequencing, measurements are destructive, so the same cell cannot be observed both before and after perturbation. As a result, perturbation prediction must map between unpaired control and perturbed populations.
2727

@@ -40,7 +40,7 @@ This repository contains the refactored runtime used for large-scale pretraining
4040
- `src/`: functional code modules and executable entrypoints.
4141
- `configs/`: Hydra configuration system for training and sampling.
4242

43-
## Table of Contents
43+
# Table of Contents
4444

4545
- [Overview](#overview)
4646
- [Feature](#feature)
@@ -69,7 +69,7 @@ This repository contains the refactored runtime used for large-scale pretraining
6969
- [Notes](#notes)
7070
- [Citation](#citation)
7171

72-
## Feature
72+
# Feature
7373

7474
- Functional refactor with clear module boundaries:
7575
- `src/apps`: run entrypoints for training/sampling
@@ -82,13 +82,13 @@ This repository contains the refactored runtime used for large-scale pretraining
8282
- pretraining (multi-source) and then finetuning (PBMC / Tahoe100M / Replogle)
8383
- conditional sampling from checkpoints
8484

85-
## Updates
85+
# Updates
8686

8787
- 2026-03-06: Releease all codes.
8888
- 2026-03-05: Release all data and ckpts on HuggingFace.
8989
- 2026-02-23: Preprint released on Arxiv.
9090

91-
## File Structure
91+
# File Structure
9292

9393
```text
9494
src/
@@ -143,9 +143,9 @@ src/
143143
└── paths.py # Path management
144144
```
145145

146-
## Installation
146+
# Installation
147147

148-
### Option A: Conda/Mamba environment for `src` (recommended on cluster)
148+
## Option A: Conda/Mamba environment for `src` (recommended on cluster)
149149

150150
```bash
151151
# from login node
@@ -167,7 +167,7 @@ pip install transformers timm geomloss
167167
pip install wandb
168168
```
169169

170-
### Option B: venv
170+
## Option B: venv
171171

172172
```bash
173173
python -m venv .venv
@@ -186,7 +186,7 @@ Sanity check:
186186
python -c "import hydra,pytorch_lightning,torch,numpy,pandas,anndata,scanpy,h5py,yaml,sklearn,transformers,timm,geomloss; print('env ok')"
187187
```
188188

189-
## General Configuration
189+
# General Configuration
190190

191191
Top-level Hydra configs:
192192

@@ -214,15 +214,15 @@ python src/apps/run/rawdata_diffusion_training.py \
214214
optimization.micro_batch_size=32
215215
```
216216

217-
## Download
217+
# Download
218218

219219
Use `huggingface_hub` CLI for both datasets and released checkpoints.
220220

221221
```bash
222222
pip install -U "huggingface_hub[cli]"
223223
```
224224

225-
### Dataset
225+
## Dataset
226226

227227
Primary dataset source: [katarinayuan/PerturbDiff_data](https://huggingface.co/datasets/katarinayuan/PerturbDiff_data)
228228

@@ -271,7 +271,7 @@ perturb_data/
271271
```
272272

273273

274-
### Checkpoint
274+
## Checkpoint
275275

276276
Released checkpoints can be downloaded from:
277277
- [katarinayuan/PerturbDiff_release_ckpt](https://huggingface.co/katarinayuan/PerturbDiff_release_ckpt/tree/main).
@@ -293,7 +293,7 @@ hf download katarinayuan/PerturbDiff_release_ckpt \
293293
--local-dir ./checkpoints/PerturbDiff_release_ckpt
294294
```
295295

296-
## Setup
296+
# Setup
297297

298298
1. Clone repo and `cd` to project root.
299299
2. Download dataset files to cluster storage.
@@ -327,7 +327,7 @@ hf download katarinayuan/PerturbDiff_release_ckpt \
327327
- Change output naming:
328328
- update `run_name=...`
329329

330-
### Quick Start
330+
## Quick Start
331331

332332
```bash
333333
# From repo root
@@ -336,12 +336,12 @@ cd PerturbDiff-Refactor
336336
# (Optional) activate env
337337
```
338338

339-
### Entrypoints
339+
## Entrypoints
340340

341341
- Training: `python ./src/apps/run/rawdata_diffusion_training.py`
342342
- Sampling: `python ./src/apps/run/rawdata_diffusion_sampling.py`
343343

344-
### Shared CLI Blocks
344+
## Shared CLI Blocks
345345

346346
Copy this block once per shell session.
347347

@@ -463,7 +463,7 @@ lightning.logger._target_=pytorch_lightning.loggers.logger.DummyLogger
463463
"
464464
```
465465

466-
### Scenario Index
466+
## Scenario Index
467467
1. From scratch training
468468
- [1.1) From Scratch on PBMC](#11-from-scratch-on-pbmc)
469469
- [1.2) From Scratch on Tahoe100M](#12-from-scratch-on-tahoe100m)
@@ -480,7 +480,7 @@ lightning.logger._target_=pytorch_lightning.loggers.logger.DummyLogger
480480
- [4.3) Finetuning on Replogle](#43-finetuning-on-replogle)
481481

482482

483-
#### 1.1) From Scratch on PBMC
483+
### 1.1) From Scratch on PBMC
484484

485485
```bash
486486
SCRATCH_PBMC_EXTRA="
@@ -502,7 +502,7 @@ $SCRATCH_PBMC_EXTRA \
502502
$NO_WANDB
503503
```
504504

505-
#### 1.2) From Scratch on Tahoe100M
505+
### 1.2) From Scratch on Tahoe100M
506506

507507
```bash
508508
SCRATCH_TAHOE_EXTRA="
@@ -524,7 +524,7 @@ $SCRATCH_TAHOE_EXTRA \
524524
$NO_WANDB
525525
```
526526

527-
#### 1.3) From Scratch on Replogle
527+
### 1.3) From Scratch on Replogle
528528

529529
```bash
530530
SCRATCH_REPLOGLE_EXTRA="
@@ -546,7 +546,7 @@ $SCRATCH_REPLOGLE_EXTRA \
546546
$NO_WANDB
547547
```
548548

549-
#### 2.1) Sampling on PBMC (from checkpoint; PBMC as an example)
549+
### 2.1) Sampling on PBMC (from checkpoint; PBMC as an example)
550550

551551
```bash
552552
CKPT_PATH=${ROOT_PATH}perturb_ckpt/your_ckpt.ckpt
@@ -568,7 +568,7 @@ $NO_WANDB
568568
```
569569

570570

571-
#### 3) Pretraining
571+
### 3) Pretraining
572572

573573
```bash
574574
PRETRAIN_EXTRA="
@@ -596,7 +596,7 @@ $PRETRAIN_EXTRA \
596596
$NO_WANDB
597597
```
598598

599-
#### 4.1) Finetuning on PBMC
599+
### 4.1) Finetuning on PBMC
600600

601601
```bash
602602
FINETUNE_PBMC_EXTRA="
@@ -616,7 +616,7 @@ $FINETUNE_COMMON \
616616
$NO_WANDB
617617
```
618618

619-
#### 4.2) Finetuning on Tahoe100M
619+
### 4.2) Finetuning on Tahoe100M
620620

621621
```bash
622622
FINETUNE_TAHOE_EXTRA="
@@ -638,7 +638,7 @@ $FINETUNE_COMMON \
638638
$NO_WANDB
639639
```
640640

641-
#### 4.3) Finetuning on Replogle
641+
### 4.3) Finetuning on Replogle
642642

643643
```bash
644644
FINETUNE_REPLOGLE_EXTRA="
@@ -661,16 +661,17 @@ $FINETUNE_REPLOGLE_EXTRA \
661661
$FINETUNE_COMMON \
662662
$NO_WANDB
663663
```
664-
### Notes
664+
## Notes
665665

666666
- All commands use Hydra CLI overrides; order matters when repeated keys are present.
667667

668-
## Citation
668+
# Citation
669669

670670
```bibtex
671-
@article{perturbdiff,
672-
title = {PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling},
673-
author = {Xinyu Yuan, Xixian Liu, Yashi Zhang, Zuobai Zhang, Hongyu Guo, and Jian Tang},
674-
year = {2026},
671+
@article{yuan2026perturbdiff,
672+
title={PerturbDiff: Functional Diffusion for Single-Cell Perturbation Modeling},
673+
author={Yuan, Xinyu and Liu, Xixian and Zhang, Ya Shi and Zhang, Zuobai and Guo, Hongyu and Tang, Jian},
674+
journal={arXiv preprint arXiv:2602.19685},
675+
year={2026}
675676
}
676677
```

0 commit comments

Comments
 (0)