Skip to content

Commit 3147b5a

Browse files
authored
Update README.md
1 parent 80490aa commit 3147b5a

1 file changed

Lines changed: 39 additions & 23 deletions

File tree

README.md

Lines changed: 39 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -28,81 +28,97 @@ A [quiz](https://rna-gan.stanford.edu) is available to get a score on how well f
2828

2929
Checkpoints for the models can be downloaded [here](https://drive.google.com/drive/folders/1aJcH8pDpjhQ1hz39aalrqgYRh4eH4Y_8?usp=sharing).
3030

31-
## Example usage
31+
# Training the betaVAE model on the RNA-Seq data
3232

33-
### betaVAE
33+
Data needs to be downloaded from the [GTEx Portal](https://gtexportal.org/home/index.html). IDs are provided in the [ref_files](https://github.com/gevaertlab/RNA-GAN/tree/master/ref_files) folder. The JSON file configuration file is provided in the [config](https://github.com/gevaertlab/RNA-GAN/blob/master/configs/betavae_tissues.json) folder, along witht the protein coding genes identifiers. The genes expression values are expected to be in a CSV with the following columns:
34+
- **wsi_file_name**: Name of the WSI file
35+
- All protein code genes names with the 'rna_' prefix, as in the [protein_coding_genes.csv](https://github.com/gevaertlab/RNA-GAN/blob/master/ref_files/protein_coding_genes.csv) file.
3436

35-
**Training the model**
37+
Once the files are available for all tissues (lung, brain, liver, stomach, and pancreas). The betaVAE can be trained as follows:
3638

3739
```bash
3840
python3 betaVAE_training.py --seed 99 --config configs/betavae_tissues.json --log 1 --parallel 0
3941
```
40-
**Compute interpolation vectors**
42+
43+
Once the model has been trained, the interpolation experiments can be performed, by firstly computing the interpolation vectors between two classes:
4144

4245
```bash
4346
python3 betaVAE_interpolation.py --seed 99 config --configs/betavae_tissues.json --log 0 --parallel 0
4447
```
45-
**Interpolating**
48+
49+
,and then interpolating samples or generating new ones:
4650

4751
```bash
4852
pythion3 betaVAE_sample.py --seed 99 --config configs/betavae_tissues.json --log 0 --parallel 0
4953
```
5054

51-
### Normal GAN and RNA-GAN
55+
# Training the GAN model on WSI tiles
5256

53-
**Normal GAN training**
57+
WSI files in SVS format need to be downloaded from the [GTEx Portal](https://gtexportal.org/home/index.html). IDs are provided in the [ref_files](https://github.com/gevaertlab/RNA-GAN/tree/master/ref_files) folder, with one file per tissue. SVS files need to be placed in independent folders, and preprocessed using the [patch_gen_grid.py](https://github.com/gevaertlab/RNA-GAN/blob/master/src/preprocess/patch_gen_grid.py) file inside the ```src/preprocess``` folder. This script will create two folders: one containing the tiles and another one containing the tissue masks. Once the tiles have been obtained, we can proceeed with the GAN training both for lung and brain cortex tissue:
5458

5559
```bash
5660
python3 histopathology_gan.py --seed 99 --config configs/gan_run_brain.json --image_dir gan_generated_images/images_gan_brain --model_dir ./checkpoints/gan_brain --num_epochs 39 --gan_type dcgan --loss_type wgan --num_patches 600
5761

5862
python3 histopathology_gan.py --seed 99 --config configs/gan_run_lung.json --image_dir gan_generated_images/images_gan_lung --model_dir ./checkpoints/gan_lung --num_epochs 91 --gan_type dcgan --loss_type wgan --num_patches 600
5963
```
6064

61-
**RNA-GAN training**
65+
The path of the preprocessed tiles and the the csv files need to be specific in the json file inside the [configs](https://github.com/gevaertlab/RNA-GAN/tree/master/configs) folder.
66+
67+
Once the model has been trained, we can generate new images using the following command:
6268

6369
```bash
64-
python3 histopathology_gan.py --seed 99 --config configs/gan_run_brain.json --image_dir gan_generated_images/images_rna-gan_brain --model_dir ./checkpoints/rna-gan_brain --num_epochs 24 --gan_type dcgan --loss_type wganvae --num_patches 600
65-
python3 histopathology_gan.py --seed 99 --config configs/gan_run_lung.json --image_dir gan_generated_images/images_rna-gan_lung --model_dir ./checkpoints/rna-gan_lung --num_epochs 11 --gan_type dcgan --loss_type wganvae --num_patches 600
70+
python3 generate_tissue_images.py --checkpoint ./checkpoints/gan_lung.model --config configs/gan_run_lung.json --sample_size 600
6671
```
6772

68-
**Compute FID metrics**
73+
# Training the RNA-GAN model on WSI tiles
6974

70-
To compute the FID metric we used the pytorch-fid library that can be installed using pip ```pip3 install pytorch-fid```. It can be executed between real and synthetic images as follows:
75+
Since we have already preprocessed the tiles we can omit that step and train the RNA-GAN model directly as follows:
7176

7277
```bash
73-
echo "REAL vs GAN 60k"
74-
python3 -m pytorch_fid real_tiles/ gan_tiles/ --device cuda:0
75-
76-
echo "REAL vs RNAGAN 60k"
77-
python3 -m pytorch_fid real_tiles/ rnagan_tiles/ --device cuda:0
78+
python3 histopathology_gan.py --seed 99 --config configs/gan_run_brain.json --image_dir gan_generated_images/images_rna-gan_brain --model_dir ./checkpoints/rna-gan_brain --num_epochs 24 --gan_type dcgan --loss_type wganvae --num_patches 600
79+
python3 histopathology_gan.py --seed 99 --config configs/gan_run_lung.json --image_dir gan_generated_images/images_rna-gan_lung --model_dir ./checkpoints/rna-gan_lung --num_epochs 11 --gan_type dcgan --loss_type wganvae --num_patches 600
7880
```
7981

80-
**Image generation**
82+
Once the model has been trained, we can generate new images with the following command:
8183

8284
```bash
83-
python3 generate_tissue_images.py --checkpoint ./checkpoints/rna-gan_lung.model --checkpoint2 ./checkpoints/gan_lung.model --config configs/gan_run_lung.json --sample_size 600 --vae --vae_checkpoint checkpoints/betavae_tissues.pt --patient1 GTEX-15RJ7-0625.svs
85+
python3 generate_tissue_images.py --checkpoint ./checkpoints/rna-gan_lung.model --config configs/gan_run_lung.json --sample_size 600 --vae --vae_checkpoint checkpoints/betavae_tissues.pt --patient1 GTEX-15RJ7-0625.svs
8486

85-
python3 generate_tissue_images.py --checkpoint ./checkpoints/rna-gan_brain.model --checkpoint2 ./checkpoints/gan_brain.model --config configs/gan_run_brain.json --sample_size 600 --vae --vae_checkpoint checkpoints/betavae_tissues.pt --patient1 GTEX-1C6WA-3025.svs
87+
python3 generate_tissue_images.py --checkpoint ./checkpoints/rna-gan_brain.model --config configs/gan_run_brain.json --sample_size 600 --vae --vae_checkpoint checkpoints/betavae_tissues.pt --patient1 GTEX-1C6WA-3025.svs
8688
```
8789

88-
# From GEO series
90+
# Generalization experiments over the GEO series
91+
92+
Data needs to be downloaded from the [GEO serie](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120795), and select the protein coding genes provided [here](https://github.com/gevaertlab/RNA-GAN/blob/master/ref_files/protein_coding_genes.csv). If the gene is not available, the value is set to zero. Then, we can use the RNA-GAN checkpoint to generate new synthetic tiles:
8993

9094
```bash
9195
python3 generate_tissue_image.py --checkpoint ./checkpoints/rna-gan_lung.model --config configs/gan_run_brain.json --sample_size 600 --vae_checkpoint checkpoints/betavae_tissues.pt --rna_file GSE120795_lung_proteincoding.csv --random_patient
9296

9397
python3 generate_tissue_image.py --checkpoint ./checkpoints/rna-gan_brain.model --config configs/gan_run_brain.json --sample_size 600 --vae_checkpoint checkpoints/betavae_tissues.pt --rna_file GSE120795_brain_proteincoding.csv --random_patient
9498
```
9599

100+
# Compute FID metrics
101+
102+
To compute the FID metric we used the pytorch-fid library that can be installed using pip ```pip3 install pytorch-fid```. It can be executed between real and synthetic images as follows:
103+
104+
```bash
105+
echo "REAL vs GAN 60k"
106+
python3 -m pytorch_fid real_tiles/ gan_tiles/ --device cuda:0
107+
108+
echo "REAL vs RNAGAN 60k"
109+
python3 -m pytorch_fid real_tiles/ rnagan_tiles/ --device cuda:0
110+
```
111+
96112
# ML experiment
97113

98-
For running the ml experiment for TCGA-GBM vs TCGA-LUAD classifitation, firstly you need to download the tiles from the checkpoint folder. Then, modify the csv file found in the ref_file accordingly, and run the following commands:
114+
For running the ML experiment for TCGA-GBM vs TCGA-LUAD classifitation, firstly you need to download the tiles from the [checkpoint](https://drive.google.com/drive/folders/1aJcH8pDpjhQ1hz39aalrqgYRh4eH4Y_8?usp=sharing) folder. Then, modify the csv file found in the [ref_file](https://github.com/gevaertlab/RNA-GAN/blob/master/ref_files/wsi_tiles_real.csv) accordingly, and run the following commands:
99115

100116
```bash
101117
python3 ml_experiments.py --csv_path ../ref_files/wsi_tiles_real.csv --save_dir /path/to/save/dir/ --use_pretrain # using pretrained weights
102118
python3 ml_experiments.py --csv_path ../ref_files/wsi_tiles_real.csv --save_dir /path/to/save/dir/ # training from scratch
103119
```
104120

105-
## Requirements and versions
121+
# Requirements and versions
106122

107123
Requirements could be installed by using pip:
108124

0 commit comments

Comments
 (0)