What do a Tree and the Human Brain have in Common - A not so Serious Introduction to Digital Pathology
This repository contains the minimalistic example code and some further literature for above mentioned talk from PyCon DE & PyData 2025.
The notebook conventional_cv.ipynb
shows the power of the most fundamental conventional CV method: basic thresholding.
To run the notebook first install all requirements via pip install -r requirements.txt
.
Note: the requirements were generated with Python 3.13 other Python and package version may also work well.
This section introduces some frameworks, tools and datasets which could be the beginning of your medical CV journey.
To handle the Whole-Slide Images correctly you can use OpenSlide.
For Machine Learning two frameworks are especially important:
- MONAI the defacto standard for medical image segmentation tasks (with it's own tutorial repository)
and - AUCMEDI, the smaller spiritual sibling with a focus on classification.
Metrics Reloaded from the German Cancer Research Center (DKFZ) provides a beautiful overview of machine learning metrics and when/how to use them. This could be your first address, if you are not sure on how to measure success.
For segmentation tasks in medical imaging always consider deepfash2 and nnU-Net v2 your first options.
While nnU-Net v2 is the more popular option with a better support, deepflash2 has proven itself superior on consumer hardware and less data in first tests. It was able to beat nnU-Net v2 and a manual implementation with MONAI on a tumor segmentation task in our internal testing (nnU-Net v2 vs deepflash2 Paper).
For classification AUCMEDI provides a simple AutoML option.
You don't have to work at a hospital to have access to medical data. By now large (sometimes fully labeled) datasets are available publicly. If you don't want to focus your computer vision work to rare diseases the choices are large.
In the pathological domain, the most popular datasets are most likely:
- the PANDA challenge, allowing for both segmentation and classification,
as well as - the CAMELYON challenge, focusing on classification.
A new challenging classification dataset was recently released, focusing on the detection of atypical cell constructs: AMi-Br.
Besides those large amounts of data are available both on zenodo as well as kaggle.
The images in the ìmages
dir are licensed for educational purposes. Please do not reuse them outside of this context.
If you want to try the thresholding on the same pathology image as shown in the notebook and during PyCon De you can download it here: https://glioblastoma.alleninstitute.org/ish/specimen/show/308886351.