Skip to content

broadinstitute/pathology-tools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

202 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pathology-tools

Toolkit containing several utility scripts with wrappers to generate, label, and evaluate real and synthetic H&E breast cancer slide patches.

Pipeline for histology model evaluation and data augmentation under varying data distribution

                      +-------------------+    +-----------------------+    +-----------------------------+
 +---------------+    |  Image Annotation |    | Dataset Constructor   |    | Model Exec                  |
 | Image Dataset +--->+                   +--->+                       +--->+                             |
 +---------------+    |  - HoverNet       |    | - Output diagnostic   |    | - Input model and datasets  |
         ^            |  - ...            |    |   datasets of varying |    | - Output performance        |
         |            |                   |    |   distribution using  |    |   breakdown in terms of     |
         |            +-------------------+    |   annotation          |    |   input dataset composition |
         |                                     +-----------------------+    +-----------------------------+
         |                                                                                 |
         |                                                                                 |
         |                                                                                 |
         |                                                                                 |
         |                                                                                 |
         |                         +-----------------------+                               |
         |                         | Data Augmentation     |                               |
         +-------------------------+                       +<------------------------------+
                                   | - Design augmented    |
                                   |   dataset reflecting  |
                                   |   points of weakness  |
                                   |   for the model       |
                                   +-----------------------+

Pathology-GAN

├── CRImage
├── data_manipulation
├── evaluation_automated_sweep
├── find_nearest_neighbors.py
├── generate_fake_samples.py
├── generate_image_interpolation.py
├── high_d_exemplar.pkl
├── low_d_exemplar.pkl
├── high_d_cluster.pkl
├── low_d_cluster.pkl
├── models
├── quantify_model_pipeline.py
├── real_features.py
# ----------------- pretrained CLAM patch_level=0 instance of PathologyGAN ---------------------
# -- link: https://drive.google.com/file/d/1ziSvMv5baSzKXV7yHUwi66k-Z6G5Ba5u/view?usp=sharing --
└── pretrained_checkpoint
    ├── PathologyGAN.ckt.data-00000-of-00001
    ├── PathologyGAN.ckt.index
    ├── PathologyGAN.ckt.meta
    └── checkpoint
#------ To download and add (to mimic original PathologyGAN results) ------
├── dataset (not necessary for generate_image_interpolation.py but may be for other functions)
    └── vgh_nki (download from https://drive.google.com/open?id=1LpgW85CVA48C8LnpmsDMdHqeCGHKsAxw) 
├── data_model_output
    └── PathologyGAN
        └── h224_w224_n3_zdim_200
            └── checkpoints
                ├── PathologyGAN.ckt.data-00000-of-00001 (download tar file from https://figshare.com/s/0a31)
                ├── PathologyGAN.ckt.index
                └── PathologyGAN.ckt.meta
#------------------------------------------------------------------------
└── run_pathgan.py

virtualenv

requirements.txt can be used to create an environment via pip install -r requirements.txt after creating and activating a blank python (version 3.6.8) virtualenv.

Usage

Synthetic Image Generation

Generating images from a trained model works in the same way as the original PathologyGAN repo (https://github.com/AdalbertoCq/Pathology-GAN). A call to generate_fake_samples.py will be of the following form:

python generate_fake_samples.py --num_samples {number of images to generate} --batch_size {batch size} --z_dim {latent dimension - should match the checkpointed model} --checkpoint {path to checkpoint files}/PathologyGAN.ckt --main_path {output directory}

Model checkpoints are stored across multiple files so the path provided should point to the .ckt file, but it's required that that file is stored in the same directory as the other checkpoint files.

The image generation method (generate_samples_from_checkpoint() in features.py) is configured to by default output files containing dictionaries that store each synthetic sample's corresponding latent vector (each vector being stored under the dictionary key matching the synthetic image's filename/number). These latents can be used for image interpolation, described below.

Image Interpolation

Slight changes to the image generation code allow for generation of example images that interpolate from specified latent vectors. generate_image_interpolation.py can be used to generate a set of images that interpolates between two input latents with the following call:

python ./generate_image_interpolation.py --num_samples 100 --z_dim 200 --checkpoint data_model_output/PathologyGAN/h224_w224_n3_zdim_200/checkpoints/PathologyGAN.ckt --exemplar1 low_d_exemplar.pkl --exemplar2 high_d_exemplar.pkl

This calls a modified version of the generate_fake_samples.py script with added logic to interpolate between specified exemplars. The script will expect pickled numpy arrays containing vectors of the same dimension as z_dim. The exemplars given are the 200-dimensional representations corresponding to the following images:

high_d_exemplar.pkl low_d_exemplar.pkl

... and the python command given above will generate an evaluation directory whose contents should be the same as those given in evaluation_automated_sweep.

generate_image_interpolation.py can also be called with multiple exemplar latents given in both input slots (that is, pickle files containing numpy arrays of shape (n, z_dim) for n the number of exemplars given). This will result in the image generation process drawing random combinations of latents from the two groups and generating new latents by taking weighted averages in the same way as the previous interpolation example. A call to the script under this mode can be made using the example pickle files low_d_cluster.pkl and high_d_cluster.pkl:

python ./generate_image_interpolation.py --num_samples 100 --z_dim 200 --checkpoint data_model_output/PathologyGAN/h224_w224_n3_zdim_200/checkpoints/PathologyGAN.ckt --exemplar1 low_d_cluster.pkl --exemplar2 high_d_cluster.pkl

Utils

utils/ directory contains utility scripts for generating and viewing patch datasets.

  • patch_dataset.py: Generates .h5 patch image dataset files (in a format that can be given to PathologGAN for training) from CLAM output (.svs and .h5 files containing source slides and patch information respectively). Example call:
python utils/patch_dataset.py --input_patches {path/to/CLAM/patches} --source_svs_dir {path/to/svs/directory} --output_prefix {path/to/output/directory} --max_dataset_size {num_patches} --shuffle
  • view_patches.py: Generates .png patch images from .h5 dataset file. Example call:
python utils/view_patches.py --input_h5 {path/to/patches.h5} --num_patches {int in range (0, dataset_size], or 'all'} --output_dir {path/to/output/directory} 

Evaluation Tools

Scripts for calulating FID and Inception Score provided in /evaluation_tools/{FID, IS} respectively (both directories contain a script with docstrings describing intended use). Code adapted from repos:

Slide-processing and Cell Segmentation Software

CLAM is a tool for the processing and analysis of .svs slides, and below we include an example command that can be run to have CLAM process a set of input .svs slide files and generate the coordinates for a desired number of patches (extracted in accordance with various parameters available to the function):

python {path}/{to}/{CLAM}/create_patches_fp.py --source /local/storage/TCGA_data/test_tcga_brca --save_dir {output directory} --patch_size {desired size}--step_size {desired size} --seg --patch --stitch [--patch_level: {desired magnification level; default=0}]

Installation instructions for CLAM are available here.

HoVer-Net GitHub repo

Cell segmentation and classification engine used to generate image annotations from slide/patch images. Usage: populate run_tile.sh shell script with the run parameters (namely input directory containing patch .jpgs, output directory where the label/annotation files will be stored, and model checkpoint to use) Example run_tile.sh:

python {path_to}/hover_net/run_infer.py \
--gpu='1' \
--nr_types=6 \
--type_info_path=type_info.json \
--batch_size=64 \
--model_mode=fast \
--model_path=hover-net-pytorch-weights/hovernet_fast_pannuke_type_tf2pytorch.tar \
--nr_inference_workers=8 \
--nr_post_proc_workers=16 \
tile \
--input_dir={path_to}/hover_net/experiment_directory/patches/ \
--output_dir={path_to}/hover_net/experiment_directory/output/ \
--mem_usage=0.1 \
--draw_dot \
--save_qupath

About

No description, website, or topics provided.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •