What do a Tree and the Human Brain have in Common - A not so Serious Introduction to Digital Pathology

This repository contains the minimalistic example code and some further literature for above mentioned talk from PyCon DE & PyData 2025.

Example Notebook

The notebook conventional_cv.ipynb shows the power of the most fundamental conventional CV method: basic thresholding.
To run the notebook first install all requirements via pip install -r requirements.txt.
Note: the requirements were generated with Python 3.13 other Python and package version may also work well.

Tools, Frameworks, and Datasets

This section introduces some frameworks, tools and datasets which could be the beginning of your medical CV journey.

Handling these large Images (correctly)

To handle the Whole-Slide Images correctly you can use OpenSlide.

Machine Learning

For Machine Learning two frameworks are especially important:

MONAI the defacto standard for medical image segmentation tasks (with it's own tutorial repository)
and
AUCMEDI, the smaller spiritual sibling with a focus on classification.

Evaluation

Metrics Reloaded from the German Cancer Research Center (DKFZ) provides a beautiful overview of machine learning metrics and when/how to use them. This could be your first address, if you are not sure on how to measure success.

AutoML

For segmentation tasks in medical imaging always consider deepfash2 and nnU-Net v2 your first options.
While nnU-Net v2 is the more popular option with a better support, deepflash2 has proven itself superior on consumer hardware and less data in first tests. It was able to beat nnU-Net v2 and a manual implementation with MONAI on a tumor segmentation task in our internal testing (nnU-Net v2 vs deepflash2 Paper).

For classification AUCMEDI provides a simple AutoML option.

Datasets

You don't have to work at a hospital to have access to medical data. By now large (sometimes fully labeled) datasets are available publicly. If you don't want to focus your computer vision work to rare diseases the choices are large.
In the pathological domain, the most popular datasets are most likely:

the PANDA challenge, allowing for both segmentation and classification,
as well as
the CAMELYON challenge, focusing on classification.

A new challenging classification dataset was recently released, focusing on the detection of atypical cell constructs: AMi-Br.

Besides those large amounts of data are available both on zenodo as well as kaggle.

Images

The images in the ìmages dir are licensed for educational purposes. Please do not reuse them outside of this context.

If you want to try the thresholding on the same pathology image as shown in the notebook and during PyCon De you can download it here: https://glioblastoma.alleninstitute.org/ish/specimen/show/308886351.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
images		images
utils		utils
.gitignore		.gitignore
README.md		README.md
conventional_cv.ipynb		conventional_cv.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

What do a Tree and the Human Brain have in Common - A not so Serious Introduction to Digital Pathology

Example Notebook

Tools, Frameworks, and Datasets

Handling these large Images (correctly)

Machine Learning

Evaluation

AutoML

Datasets

Images

About

Uh oh!

Releases

Packages

Uh oh!

Languages

DanielHieber/PyCon25-Python-for-Patho

Folders and files

Latest commit

History

Repository files navigation

What do a Tree and the Human Brain have in Common - A not so Serious Introduction to Digital Pathology

Example Notebook

Tools, Frameworks, and Datasets

Handling these large Images (correctly)

Machine Learning

Evaluation

AutoML

Datasets

Images

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages