Here we describe the pipeline for the segmentation of the Vessel cells. This includes the data preparation steps, and training/testing of the model.
First we need to make a dataset from the annotated part of the whole-slide images. Each provided image came with a QuPath project that contains annotations on the vessels in a pie-shaped region. We can export tiles (patches) and their labels (masks) using QuPath TileExporter.
To export these tiles and their masks, please check the script and documentation at here. The exported tiles will be saved as .tif files under tiles directory including images and labels.
Since the annotated regions are pie-shaped, some tiles at the edges may contain only partial annotation which means they might contain more vessels but without annotations. This will make it hard for model to distinguish between vessels and non-vessels. Therefore, we need to filter out those tiles before feeding them into the model.
To do so, first we need to export the polygon area around each annotated pie area. In QuPath, select the vessel.region.small annotation and export the polygon as a GeoJSON file.
Note
Remember to activate the environment before running any commands.
Like: conda activate tree
After that, you can run get_tiles_within_region.py notebook to do this filtering step:
marimo run get_tiles_within_region.pyIn this notebook, you should specify the path to the extracted tile images directory, and the exported GeoJSON file.
The filtered tiles will be saved in the tiles_within_region directory.
Next, we need to make a dataset collecting all filtered tiles across different species, and then we split the dataset into train/test datasets.
marimo run make_dataset.pyThe model we designed for this segmentation task has two main parts: Encoder and Decoder. For the encoder part we used the hiera transformer. In particular, we utilized the SAM2 image encoder loaded with the pretrained weights (SAM2 small version). The encoder part is fixed during the training process.
The decoder part consists of several convolutional layers based on the U-Net architecture.
Note
You need to download the encoder weights from here and save it in the same directory where train.py resides.
You can train the model using the following command:
python train.py --dataset=./data/vessel_dataset/trainYou need to provide the path to your train dataset directory that you made in the previous step (like ./data/vessel_dataset/train).
Also, you can change the batch_size, which is set to 16 by default. This value depends on how much GPU memory you have available.
python train.py --dataset=./data/vessel_dataset/train --bsize=32During the training process, you can monitor the loss values. All the training logs and model checkpoints will be saved in the checkpoints folder.
Once the model is trained, you can use it to predict the segmentation mask for the test dataset and get the test score. To do so, you can run the following command:
python predict.py --dataset=./data/vessel_dataset/test --model_path=./checkpoints/<your_experiment_folder>/model_best.pthWhat you need to provide here is the path to your test dataset directory and the path to the best checkpoint of your experiment.
This will generate predicted masks for each tile in the test dataset. You can find the predictions in the predictions directory inside the dataset main directory (e.g. vessel_dataset/test/predictions).
To use the model to predict the segmentation mask for images out of the train/test datasets, you can run the inference command:
python inference.py --dataset="./data/species/dataset/level_1" --model_path="./checkpoints/<your_experiment_folder>/model_best.pth"The data folder could be the same folder that was created by the /featureforest/species_dataset.py notebook here. So, after the inference, you can merge all the predicted masks together using the featureforest/make_zarr_dataset.py notebook. See here for more details about making the zarr dataset.




