Skip to content
This repository was archived by the owner on Oct 25, 2021. It is now read-only.

Latest commit

 

History

History
400 lines (318 loc) · 12.9 KB

File metadata and controls

400 lines (318 loc) · 12.9 KB

Catalyst logo

Accelerated DL & RL

Build Status CodeFactor Pipi version Docs PyPI Status

Twitter Telegram Slack Github contributors

PyTorch framework for Deep Learning research and development. It was developed with a focus on reproducibility, fast experimentation and code/ideas reusing. Being able to research/develop something new, rather than write another regular train loop.
Break the cycle - use the Catalyst!

Project manifest. Part of PyTorch Ecosystem. Part of Catalyst Ecosystem:

  • Alchemy - Experiments logging & visualization
  • Catalyst - Accelerated Deep Learning Research and Development
  • Reaction - Convenient Deep Learning models serving

Catalyst at AI Landscape.


Catalyst.Segmentation Build Status Github contributors

You will learn how to build image segmentation pipeline with transfer learning using the Catalyst framework.

Goals

  1. Install requirements
  2. Prepare data
  3. Run: raw data → production-ready model
  4. Get results
  5. Customize own pipeline

1. Install requirements

Using local environment:

pip install -r requirements/requirements.txt

Using docker:

This creates a build catalyst-segmentation with the necessary libraries:

make docker-build

2. Get Dataset

Try on open datasets

You can use one of the open datasets

export DATASET="isbi"

rm -rf data/
mkdir -p data

if [[ "$DATASET" == "isbi" ]]; then
    # binary segmentation
    # http://brainiac2.mit.edu/isbi_challenge/
    download-gdrive 1uyPb9WI0t2qMKIqOjFKMv1EtfQ5FAVEI isbi_cleared_191107.tar.gz
    tar -xf isbi_cleared_191107.tar.gz &>/dev/null
    mv isbi_cleared_191107 ./data/origin
elif [[ "$DATASET" == "voc2012" ]]; then
    # semantic segmentation
    # http://host.robots.ox.ac.uk/pascal/VOC/voc2012/
    wget http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar
    tar -xf VOCtrainval_11-May-2012.tar &>/dev/null
    mkdir -p ./data/origin/images/; mv VOCdevkit/VOC2012/JPEGImages/* $_
    mkdir -p ./data/origin/raw_masks; mv VOCdevkit/VOC2012/SegmentationClass/* $_
elif [[ "$DATASET" == "dsb2018" ]]; then
    # instance segmentation
    # https://www.kaggle.com/c/data-science-bowl-2018
    download-gdrive 1RCqaQZLziuq1Z4sbMpwD_WHjqR5cdPvh dsb2018_cleared_191109.tar.gz
    tar -xf dsb2018_cleared_191109.tar.gz &>/dev/null
    mv dsb2018_cleared_191109 ./data/origin
fi

Use your own dataset

Prepare your dataset

Data structure

Make sure, that final folder with data has the required structure:

Data structure for binary segmentation

/path/to/your_dataset/
        images/
            image_1
            image_2
            ...
            image_N
        raw_masks/
            mask_1
            mask_2
            ...
            mask_N

where each mask is a binary image

Data structure for semantic segmentation

/path/to/your_dataset/
        images/
            image_1
            image_2
            ...
            image_N
        raw_masks/
            mask_1
            mask_2
            ...
            mask_N

where each mask is an image with class encoded through colors e.g. VOC2012 dataset where bicycle class is encoded with green color and bird with olive

Data structure for instance segmentation

/path/to/your_dataset/
        images/
            image_1
            image_2
            ...
            image_M
        raw_masks/
            mask_1/
                instance_1
                instance_2
                ...
                instance_N
            mask_2/
                instance_1
                instance_2
                ...
                instance_K
            ...
            mask_M/
                instance_1
                instance_2
                ...
                instance_Z

where each mask represented as a folder with instances images (one image per instance), and masks may consisting of a different number of instances e.g. Data Science Bowl 2018 dataset

Data location

  • The easiest way is to move your data:

    mv /path/to/your_dataset/* /catalyst.segmentation/data/origin

    In that way you can run pipeline with default settings.

  • If you prefer leave data in /path/to/your_dataset/

    • In local environment:

      • Link directory
        ln -s /path/to/your_dataset $(pwd)/data/origin
      • Or just set path to your dataset DATADIR=/path/to/your_dataset when you start the pipeline.
    • Using docker

      You need to set:

         -v /path/to/your_dataset:/data \ #instead default  $(pwd)/data/origin:/data

      in the script below to start the pipeline.

3. Segmentation pipeline

Fast&Furious: raw data → production-ready model

The pipeline will automatically guide you from raw data to the production-ready model.

We will initialize Unet model with a pre-trained ResNet-18 encoder. During current pipeline model will be trained sequentially in two stages.

Binary segmentation pipeline

Run in local environment:

CUDA_VISIBLE_DEVICES=0 \
CUDNN_BENCHMARK="True" \
CUDNN_DETERMINISTIC="True" \
bash ./bin/catalyst-binary-segmentation-pipeline.sh \
    --workdir ./logs \
    --datadir ./data/origin \
    --max-image-size 256 \
    --config-template ./configs/templates/binary.yml \
    --num-workers 4 \
    --batch-size 256

Run in docker:

docker run -it --rm --shm-size 8G --runtime=nvidia \
    -v $(pwd):/workspace/ \
    -v $(pwd)/logs:/logdir/ \
    -v $(pwd)/data/origin:/data \
    -e "CUDA_VISIBLE_DEVICES=0" \
    -e "CUDNN_BENCHMARK='True'" \
    -e "CUDNN_DETERMINISTIC='True'" \
    catalyst-segmentation ./bin/catalyst-binary-segmentation-pipeline.sh \
        --workdir /logdir \
        --datadir /data \
        --max-image-size 256 \
        --config-template ./configs/templates/binary.yml \
        --num-workers 4 \
        --batch-size 256

Semantic segmentation pipeline

Run in local environment:

CUDA_VISIBLE_DEVICES=0 \
CUDNN_BENCHMARK="True" \
CUDNN_DETERMINISTIC="True" \
bash ./bin/catalyst-semantic-segmentation-pipeline.sh \
    --workdir ./logs \
    --datadir ./data/origin \
    --max-image-size 256 \
    --config-template ./configs/templates/semantic.yml \
    --num-workers 4 \
    --batch-size 256

Run in docker:

docker run -it --rm --shm-size 8G --runtime=nvidia \
    -v $(pwd):/workspace/ \
    -v $(pwd)/logs:/logdir/ \
    -v $(pwd)/data/origin:/data \
    -e "CUDA_VISIBLE_DEVICES=0" \
    -e "CUDNN_BENCHMARK='True'" \
    -e "CUDNN_DETERMINISTIC='True'" \
    catalyst-segmentation ./bin/catalyst-semantic-segmentation-pipeline.sh \
        --workdir /logdir \
        --datadir /data \
        --max-image-size 256 \
        --config-template ./configs/templates/semantic.yml \
        --num-workers 4 \
        --batch-size 256

Instance segmentation pipeline

Run in local environment:

CUDA_VISIBLE_DEVICES=0 \
CUDNN_BENCHMARK="True" \
CUDNN_DETERMINISTIC="True" \
bash ./bin/catalyst-semantic-segmentation-pipeline.sh \
    --workdir ./logs \
    --datadir ./data/origin \
    --max-image-size 256 \
    --config-template ./configs/templates/instance.yml \
    --num-workers 4 \
    --batch-size 256

Run in docker:

docker run -it --rm --shm-size 8G --runtime=nvidia \
    -v $(pwd):/workspace/ \
    -v $(pwd)/logs:/logdir/ \
    -v $(pwd)/data/origin:/data \
    -e "CUDA_VISIBLE_DEVICES=0" \
    -e "CUDNN_BENCHMARK='True'" \
    -e "CUDNN_DETERMINISTIC='True'" \
    catalyst-segmentation ./bin/catalyst-instance-segmentation-pipeline.sh \
        --workdir /logdir \
        --datadir /data \
        --max-image-size 256 \
        --config-template ./configs/templates/instance.yml \
        --num-workers 4 \
        --batch-size 256

The pipeline is running and you don’t have to do anything else, it remains to wait for the best model!

Visualizations

Tensorboard can be used for visualisation:

tensorboard --logdir=/catalyst.segmentation/logs

4. Results

All results of all experiments can be found locally in WORKDIR, by default catalyst.segmentation/logs. Results of experiment, for instance catalyst.segmentation/logs/logdir-191107-094627-2f31d790, contain:

checkpoints

  • The directory contains all checkpoints: best, last, also of all stages.
  • best.pth and last.pht can be also found in the corresponding experiment in your W&B account.

configs

  • The directory contains experiment`s configs for reproducibility.

logs

  • The directory contains all logs of experiment.
  • Metrics also logs can be displayed in the corresponding experiment in your W&B account.

code

  • The directory contains code on which calculations were performed. This is necessary for complete reproducibility.

5. Customize own pipeline

For your future experiments framework provides powerful configs allow to optimize configuration of the whole pipeline of segmentation in a controlled and reproducible way.

Configure your experiments

  • Common settings of stages of training and model parameters can be found in catalyst.segmentation/configs/_common.yml.

    • model_params: detailed configuration of models, including:
      • model, for instance ResNetUnet
      • detailed architecture description
      • using pretrained model
    • stages: you can configure training or inference in several stages with different hyperparameters. In our example:
      • optimizer params
      • first learn the head(s), then train the whole network
  • The CONFIG_TEMPLATE with other experiment`s hyperparameters, such as data_params and is here: catalyst.segmentation/configs/templates/binary.yml. The config allows you to define:

    • data_params: path, batch size, num of workers and so on
    • callbacks_params: Callbacks are used to execute code during training, for example, to get metrics or save checkpoints. Catalyst provide wide variety of helpful callbacks also you can use custom.

You can find much more options for configuring experiments in catalyst documentation.