A deep learning pipeline for building type classification from high-resolution aerial imagery into 7 classes, using a two-stage workflow:
- RefineNet-based segmentation + post-processing to extract building instances
- DenseNet201 to classify each building crop
This repository accompanies our manuscript on automated building classification at U.S. nationwide scale.
Building classification from aerial imagery supports urban planning, infrastructure assessment, environmental monitoring, and disaster response. We present a two-stage deep learning pipeline that first isolates building footprints using RefineNet segmentation with robust post-processing, then classifies the extracted building crops with a DenseNet201 classifier.
We curate a nationwide dataset of 11,921 building samples from Google Earth imagery spanning 50 U.S. states, covering diverse architectural styles and geographic regions. The proposed approach achieves 84.40% test accuracy across 7 building classes.
| Class | Description | Example Characteristics |
|---|---|---|
| Commercial | Retail, offices, shopping centers | Large footprints, parking lots |
| High-rise | Multi-story towers (>10 floors) | Vertical structures, smaller footprint |
| Hospital | Healthcare facilities | Complex layouts (e.g., H-shapes), helipads |
| Industrial | Factories, warehouses | Large flat roofs, loading bays |
| Multi-unit Residential | Apartments, condos | Clustered units, regular patterns |
| School | Educational institutions | Athletic fields, bus loops |
| Single-unit Residential | Detached homes | Individual lots, varied rooflines |
Commercial![]() |
High-rise![]() |
Hospital![]() |
Industrial![]() |
Multi-family![]() |
Schools![]() |
Single-family![]() |
512×512 px ~0.15 m/pixel |
Important licensing note (Google Earth imagery):
Google Earth imagery is subject to third-party terms and may restrict redistribution of raw tiles. This repository is designed to support reproducibility by providing scripts and metadata to reconstruct imagery under the user’s own compliant access. Please follow your institution’s and provider’s licensing requirements.
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ ┌────────────────┐
│ Google Earth │────▶│ ReFineNet │────▶│ DenseNet201 │────▶│ Building │
│ Satellite │ │ Segmentation │ │ Classifier │ │ Class │
│ 512×512 @ 0.15m│ │ + Watershed │ │ (7 classes) │ │ Prediction │
└─────────────────┘ └──────────────────┘ └─────────────────┘ └────────────────┘
- Source: Google Earth imagery acquired via segment-geospatial (samgeo)
- Resolution: 512×512 px at approximately 0.15 m/pixel
- Coverage: 50 U.S. states (diverse architectural and geographic variation)
To avoid geographic leakage, we create splits using grouped sampling so that no city/tile group appears across train/val/test simultaneously. This supports a more honest estimate of generalization to unseen locations.
Our segmentation module extracts individual building footprints from satellite imagery using a multi-stage approach:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ INPUT │ │ TTA │ │ REFINENET │ │ MORPH OPS │ │ WATERSHED │
│ 512×512 │───▶│ H/V Flip │───▶│ Building │───▶│ Opening │───▶│ Algorithm │
│ Satellite │ │ 4 versions │ │ Masks │ │ Clean up │ │ Separate │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └──────┬──────┘
│
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ BUILDING │◀───│ SIZE │◀───│ LABELED │◀─────────┘
│ CROPS │ │ FILTER │ │ REGIONS │
│ for CNN │ │ 500-100K px │ │ │
└─────────────┘ └─────────────┘ └─────────────┘
| Step | Operation | Description | Paper Reference |
|---|---|---|---|
| 1️⃣ | Preprocessing | Resize to 512×512, normalize pixels to [0,1] | §3.2 |
| 2️⃣ | TTA | Generate H-flip, V-flip, HV-flip versions | §3.2 |
| 3️⃣ | ReFineNet | Pretrained semantic segmentation network | Lin et al., 2017 |
| 4️⃣ | Averaging | Average TTA predictions for robust masks | §3.2 |
| 5️⃣ | Morphological Opening | Remove small artifacts and noise | §3.2 |
| 6️⃣ | Watershed | Separate connected/overlapping buildings | Meyer, 1994 |
| 7️⃣ | Size Filtering | Keep segments with 500-100,000 pixels | §3.2 |
Pipeline Steps:
- Original - High-resolution satellite imagery (512×512 px)
- Binary Mask - Building footprints extracted by ReFineNet
- Watershed Labels - Individual buildings separated with color-coded regions
- Detected Buildings - Final output with green bounding boxes
"Post-processing further refined these masks by applying morphological opening to eliminate small artifacts and reduce noise, followed by the watershed algorithm, chosen for its efficacy in segmenting connected or overlapping building structures." — Paper §3.2
Classification Model (§3.2)
- Backbone: DenseNet201 (ImageNet pretrained)
- Head: GAP → Dense(256, ReLU, L2=0.001) → Dropout(0.5) → Softmax(7)
Hyperparameters from Table 4 in the paper:
| Parameter | Value | Notes |
|---|---|---|
| Optimizer | Adam | β₁=0.9, β₂=0.999 |
| Learning Rate | 1e-4 | Reduced on plateau |
| Batch Size | 32 | Balanced memory/speed |
| Max Epochs | 20 | Early stopping applied |
| Dropout Rate | 0.5 | FC layer regularization |
| L2 Regularization | 0.001 | Dense layer |
| Early Stopping | patience=3 | Restore best weights |
| LR Scheduler | ReduceLROnPlateau | factor=0.2, patience=2 |
Data Augmentation:
- Horizontal/Vertical flips
- Rotation: ±15°
- Zoom: 90-110%
- Brightness adjustment
Data Split: 80% train / 10% validation / 10% test
| Metric | Score |
|---|---|
| Test Accuracy | 84.40% |
| Validation Accuracy | 84.39% |
| Training Accuracy | >95% |
| Macro F1-Score | 0.84 |
| Weighted F1-Score | 0.84 |
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Commercial | 0.80 | 0.60 | 0.69 | 20 |
| High-rise | 0.95 | 0.90 | 0.92 | 20 |
| Hospital | 0.84 | 0.80 | 0.82 | 20 |
| Industrial | 0.83 | 0.95 | 0.89 | 21 |
| Multi-family | 0.77 | 0.85 | 0.81 | 20 |
| Schools | 0.77 | 0.85 | 0.81 | 20 |
| Single-family | 0.95 | 0.95 | 0.95 | 20 |
| Overall | 0.85 | 0.84 | 0.84 | 141 |
✅ Best Performance: Single-family (F1=0.95) and High-rise (F1=0.92)
- Distinct architectural features make classification easier
- Often confused with Multi-family due to similar footprint patterns
Here's an example of how the model classifies a building:
| Input Image | Prediction |
|---|---|
![]() |
Predicted Class: High-rise Confidence: 95% Ground Truth: High-rise ✅ |
Probability Distribution:
Commercial ████░░░░░░░░░░░░░░░░ 2.1%
High-rise ████████████████████ 95.0% ◄ Predicted
Hospital ░░░░░░░░░░░░░░░░░░░░ 0.5%
Industrial ░░░░░░░░░░░░░░░░░░░░ 0.3%
Multi-family █░░░░░░░░░░░░░░░░░░░ 1.2%
Schools ░░░░░░░░░░░░░░░░░░░░ 0.4%
Single ░░░░░░░░░░░░░░░░░░░░ 0.5%
building-classification/
├── 📄 README.md # This file
├── 📄 LICENSE # MIT License
├── 📄 CITATION.cff # Citation metadata
├── 📄 requirements.txt # Python dependencies
│
├── 📁 notebooks/
│ ├── 01_data_collection.ipynb # Satellite image acquisition
│ ├── 02_preprocessing_segmentation.ipynb # ReFineNet + watershed
│ ├── 03_model_training.ipynb # DenseNet201 training
│ └── 04_evaluation_inference.ipynb # Metrics & predictions
│
├── 📁 data/
│ └── processed/ # Train/Val/Test splits
│ ├── train/ # 80% of data
│ ├── val/ # 10% of data
│ └── test/ # 10% of data (141 images)
│
├── 📁 models/ # Trained weights
│ └── README.md # Download instructions
│
├── 📁 results/ # Figures & metrics
│ └── figures/
│
└── 📁 paper/ # Research paper
# Clone repository
git clone https://github.com/madhugoutham/building-classification.git
cd building-classification
# Create environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtfrom tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image
import numpy as np
# Load model
model = load_model('models/densenet201_best.h5')
# Building classes
CLASSES = ['Commercial', 'High', 'Hospital', 'Industrial',
'Multi', 'Schools', 'Single']
# Predict
img = image.load_img('building.tif', target_size=(224, 224))
x = image.img_to_array(img) / 255.0
x = np.expand_dims(x, axis=0)
pred = model.predict(x)
print(f"Predicted: {CLASSES[np.argmax(pred)]} ({np.max(pred)*100:.1f}%)")| Study | Year | Classes | Accuracy | Region |
|---|---|---|---|---|
| Helber et al. (EuroSAT) | 2019 | 10 land-use | 98.57% | Europe |
| Atwal et al. (OSM) | 2022 | 2 | 98% | US (3 counties) |
| Dimassi et al. (BBTC) | 2021 | 2 | 94.8% | Lebanon |
| Erdem & Avdan (Inria) | 2020 | Binary | 87.69% | US (Chicago) |
| This Work | 2025 | 7 | 84.40% | US (nationwide) |
- Python 3.8+
- TensorFlow 2.x
- CUDA GPU (recommended for training)
- 8GB+ RAM
See requirements.txt for full dependencies.
Pre-trained model weights (hosted externally due to size):
| Model | Size | Description |
|---|---|---|
densenet201_best.h5 |
~80MB | Best validation accuracy |
See models/README.md for download instructions.
@article{building_classification_2025,
title={Building Type Classification in Satellite Imagery Using DenseNet201},
author={Ambati, Madhu Goutham and Shaikh, Abdul Rahman},
year={2025},
publisher={Zenodo},
doi={10.5281/zenodo.18512944}
}See CITATION.cff for machine-readable citation.
This project is licensed under the MIT License - see LICENSE for details.
- Google Earth for satellite imagery
- segment-geospatial (samgeo) for image acquisition
- TensorFlow/Keras for DenseNet201 implementation
- ReFineNet for building segmentation
For questions, issues, or collaboration inquiries, please open an issue.







