Final project for Deep Learning School @ MIPT (ФМПИ, МФТИ)
Semantic segmentation of buildings using satellite images from the Inria Aerial Image Labeling Dataset.
This project tackles the task of segmenting building footprints from high-resolution aerial imagery.
We use a custom U-Net model with a ResNet-34 encoder trained from scratch (no pretrained weights).
🔍 Goal: generate accurate binary masks of buildings from satellite images.
You can test the model online:
🔗 Streamlit Demo — upload your own image and get building masks in real time.
| Input Image | Mask | Input + Mask |
|---|---|---|
![]() |
![]() |
![]() |
- Architecture: U-Net + ResNet-34 encoder
- Loss Function: Combined Dice Loss + Binary Cross Entropy (BCE)
- Trained From Scratch: No pretrained weights used
- Environment: Google Colab, Kaggle
app.py — Launching the Streamlit application
main_train.ipynb — Training in Colab
main_train_kaggle.ipynb — Training in Kaggle
requirements.txt — Dependencies
README.md — Project description
configs/ — Project configurations
experiments/ — Checkpoints, logs, predictions
src/app/ — Streamlit interface and visualization
src/data/ — Data loading and processing
src/models/ — Model and loss function
src/utils/ — Metrics, saving, plots
src/train.py — Model training
- Dataset: Inria Aerial Image Labeling Dataset
- Course: Deep Learning School, MIPT (ФМПИ МФТИ)
- Author: Evgenii Ilnitski



