🏗️ Building Type Classification from Aerial Imagery using DenseNet201

A deep learning pipeline for building type classification from high-resolution aerial imagery into 7 classes, using a two-stage workflow:

RefineNet-based segmentation + post-processing to extract building instances
DenseNet201 to classify each building crop

This repository accompanies our manuscript on automated building classification at U.S. nationwide scale.

📋 Abstract

Building classification from aerial imagery supports urban planning, infrastructure assessment, environmental monitoring, and disaster response. We present a two-stage deep learning pipeline that first isolates building footprints using RefineNet segmentation with robust post-processing, then classifies the extracted building crops with a DenseNet201 classifier.

We curate a nationwide dataset of 11,921 building samples from Google Earth imagery spanning 50 U.S. states, covering diverse architectural styles and geographic regions. The proposed approach achieves 84.40% test accuracy across 7 building classes.

🎯 Building Classes

Class	Description	Example Characteristics
Commercial	Retail, offices, shopping centers	Large footprints, parking lots
High-rise	Multi-story towers (>10 floors)	Vertical structures, smaller footprint
Hospital	Healthcare facilities	Complex layouts (e.g., H-shapes), helipads
Industrial	Factories, warehouses	Large flat roofs, loading bays
Multi-unit Residential	Apartments, condos	Clustered units, regular patterns
School	Educational institutions	Athletic fields, bus loops
Single-unit Residential	Detached homes	Individual lots, varied rooflines

📸 Sample Building Images

Commercial	High-rise	Hospital	Industrial
Multi-family	Schools	Single-family	512×512 px ~0.15 m/pixel

📊 Dataset & Model Availability (Zenodo)

Dataset Records

Paper Dataset (indices + labels / reproducibility bundle):
Combined / Extended Research Bundle:

🧠 Model Weights

DenseNet201 (Best Model):
- Pre-trained weights (.h5) corresponding to the reported performance.

Important licensing note (Google Earth imagery):
Google Earth imagery is subject to third-party terms and may restrict redistribution of raw tiles. This repository is designed to support reproducibility by providing scripts and metadata to reconstruct imagery under the user’s own compliant access. Please follow your institution’s and provider’s licensing requirements.

🔬 Methodology

Pipeline Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐     ┌────────────────┐
│  Google Earth   │────▶│   ReFineNet      │────▶│   DenseNet201   │────▶│  Building      │
│  Satellite      │     │   Segmentation   │     │   Classifier    │     │  Class         │
│  512×512 @ 0.15m│     │   + Watershed    │     │   (7 classes)   │     │  Prediction    │
└─────────────────┘     └──────────────────┘     └─────────────────┘     └────────────────┘

Data Acquisition

Source: Google Earth imagery acquired via segment-geospatial (samgeo)
Resolution: 512×512 px at approximately 0.15 m/pixel
Coverage: 50 U.S. states (diverse architectural and geographic variation)

Leakage-Safe Geographic Splitting

To avoid geographic leakage, we create splits using grouped sampling so that no city/tile group appears across train/val/test simultaneously. This supports a more honest estimate of generalization to unseen locations.

🔍 Segmentation Pipeline (§3.2)

Our segmentation module extracts individual building footprints from satellite imagery using a multi-stage approach:

Segmentation Architecture

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   INPUT     │    │   TTA       │    │  REFINENET  │    │  MORPH OPS  │    │  WATERSHED  │
│  512×512    │───▶│  H/V Flip   │───▶│  Building   │───▶│  Opening    │───▶│  Algorithm  │
│  Satellite  │    │  4 versions │    │  Masks      │    │  Clean up   │    │  Separate   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘    └──────┬──────┘
                                                                                   │
                   ┌─────────────┐    ┌─────────────┐    ┌─────────────┐          │
                   │  BUILDING   │◀───│   SIZE      │◀───│   LABELED   │◀─────────┘
                   │   CROPS     │    │   FILTER    │    │   REGIONS   │
                   │  for CNN    │    │ 500-100K px │    │             │
                   └─────────────┘    └─────────────┘    └─────────────┘

Step-by-Step Process

Step	Operation	Description	Paper Reference
1️⃣	Preprocessing	Resize to 512×512, normalize pixels to [0,1]	§3.2
2️⃣	TTA	Generate H-flip, V-flip, HV-flip versions	§3.2
3️⃣	ReFineNet	Pretrained semantic segmentation network	Lin et al., 2017
4️⃣	Averaging	Average TTA predictions for robust masks	§3.2
5️⃣	Morphological Opening	Remove small artifacts and noise	§3.2
6️⃣	Watershed	Separate connected/overlapping buildings	Meyer, 1994
7️⃣	Size Filtering	Keep segments with 500-100,000 pixels	§3.2

📸 Segmentation Pipeline Visualization

Pipeline Steps:

Original - High-resolution satellite imagery (512×512 px)
Binary Mask - Building footprints extracted by ReFineNet
Watershed Labels - Individual buildings separated with color-coded regions
Detected Buildings - Final output with green bounding boxes

"Post-processing further refined these masks by applying morphological opening to eliminate small artifacts and reduce noise, followed by the watershed algorithm, chosen for its efficacy in segmenting connected or overlapping building structures." — Paper §3.2

Classification Model (§3.2)

Backbone: DenseNet201 (ImageNet pretrained)
Head: GAP → Dense(256, ReLU, L2=0.001) → Dropout(0.5) → Softmax(7)

⚙️ Training Configuration

Hyperparameters from Table 4 in the paper:

Parameter	Value	Notes
Optimizer	Adam	β₁=0.9, β₂=0.999
Learning Rate	1e-4	Reduced on plateau
Batch Size	32	Balanced memory/speed
Max Epochs	20	Early stopping applied
Dropout Rate	0.5	FC layer regularization
L2 Regularization	0.001	Dense layer
Early Stopping	patience=3	Restore best weights
LR Scheduler	ReduceLROnPlateau	factor=0.2, patience=2

Data Augmentation:

Horizontal/Vertical flips
Rotation: ±15°
Zoom: 90-110%
Brightness adjustment

Data Split: 80% train / 10% validation / 10% test

📊 Model Performance

Overall Metrics

Metric	Score
Test Accuracy	84.40%
Validation Accuracy	84.39%
Training Accuracy	>95%
Macro F1-Score	0.84
Weighted F1-Score	0.84

Per-Class Performance (Table 5)

Class	Precision	Recall	F1-Score	Support
Commercial	0.80	0.60	0.69	20
High-rise	0.95	0.90	0.92	20
Hospital	0.84	0.80	0.82	20
Industrial	0.83	0.95	0.89	21
Multi-family	0.77	0.85	0.81	20
Schools	0.77	0.85	0.81	20
Single-family	0.95	0.95	0.95	20
Overall	0.85	0.84	0.84	141

Key Findings

✅ Best Performance: Single-family (F1=0.95) and High-rise (F1=0.92)

Distinct architectural features make classification easier

⚠️ Challenging Classes: Commercial (F1=0.69)

Often confused with Multi-family due to similar footprint patterns

🔍 Classification Example

Here's an example of how the model classifies a building:

Input Image	Prediction
	Predicted Class: High-rise Confidence: 95% Ground Truth: High-rise ✅

Probability Distribution:

Commercial   ████░░░░░░░░░░░░░░░░  2.1%
High-rise    ████████████████████  95.0%  ◄ Predicted
Hospital     ░░░░░░░░░░░░░░░░░░░░  0.5%
Industrial   ░░░░░░░░░░░░░░░░░░░░  0.3%
Multi-family █░░░░░░░░░░░░░░░░░░░  1.2%
Schools      ░░░░░░░░░░░░░░░░░░░░  0.4%
Single       ░░░░░░░░░░░░░░░░░░░░  0.5%

📂 Repository Structure

building-classification/
├── 📄 README.md                 # This file
├── 📄 LICENSE                   # MIT License
├── 📄 CITATION.cff              # Citation metadata
├── 📄 requirements.txt          # Python dependencies
│
├── 📁 notebooks/
│   ├── 01_data_collection.ipynb           # Satellite image acquisition
│   ├── 02_preprocessing_segmentation.ipynb # ReFineNet + watershed
│   ├── 03_model_training.ipynb            # DenseNet201 training
│   └── 04_evaluation_inference.ipynb      # Metrics & predictions
│
├── 📁 data/
│   └── processed/               # Train/Val/Test splits
│       ├── train/               # 80% of data
│       ├── val/                 # 10% of data
│       └── test/                # 10% of data (141 images)
│
├── 📁 models/                   # Trained weights
│   └── README.md                # Download instructions
│
├── 📁 results/                  # Figures & metrics
│   └── figures/
│
└── 📁 paper/                    # Research paper

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/madhugoutham/building-classification.git
cd building-classification

# Create environment
python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Inference Example

from tensorflow.keras.models import load_model
from tensorflow.keras.preprocessing import image
import numpy as np

# Load model
model = load_model('models/densenet201_best.h5')

# Building classes
CLASSES = ['Commercial', 'High', 'Hospital', 'Industrial', 
           'Multi', 'Schools', 'Single']

# Predict
img = image.load_img('building.tif', target_size=(224, 224))
x = image.img_to_array(img) / 255.0
x = np.expand_dims(x, axis=0)

pred = model.predict(x)
print(f"Predicted: {CLASSES[np.argmax(pred)]} ({np.max(pred)*100:.1f}%)")

📚 Comparison with Related Work

Study	Year	Classes	Accuracy	Region
Helber et al. (EuroSAT)	2019	10 land-use	98.57%	Europe
Atwal et al. (OSM)	2022	2	98%	US (3 counties)
Dimassi et al. (BBTC)	2021	2	94.8%	Lebanon
Erdem & Avdan (Inria)	2020	Binary	87.69%	US (Chicago)
This Work	2025	7	84.40%	US (nationwide)

🔧 Requirements

Python 3.8+
TensorFlow 2.x
CUDA GPU (recommended for training)
8GB+ RAM

See requirements.txt for full dependencies.

📥 Model Weights

Pre-trained model weights (hosted externally due to size):

Model	Size	Description
`densenet201_best.h5`	~80MB	Best validation accuracy

See models/README.md for download instructions.

📖 Citation

@article{building_classification_2025,
  title={Building Type Classification in Satellite Imagery Using DenseNet201},
  author={Ambati, Madhu Goutham and Shaikh, Abdul Rahman},
  year={2025},
  publisher={Zenodo},
  doi={10.5281/zenodo.18512944}
}

See CITATION.cff for machine-readable citation.

📜 License

This project is licensed under the MIT License - see LICENSE for details.

🙏 Acknowledgments

Google Earth for satellite imagery
segment-geospatial (samgeo) for image acquisition
TensorFlow/Keras for DenseNet201 implementation
ReFineNet for building segmentation

📧 Contact

For questions, issues, or collaboration inquiries, please open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
data		data
models		models
notebooks		notebooks
paper		paper
results/figures		results/figures
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🏗️ Building Type Classification from Aerial Imagery using DenseNet201

📋 Abstract

🎯 Building Classes

📸 Sample Building Images

📊 Dataset & Model Availability (Zenodo)

Dataset Records

🧠 Model Weights

🔬 Methodology

Pipeline Architecture

Data Acquisition

Leakage-Safe Geographic Splitting

🔍 Segmentation Pipeline (§3.2)

Segmentation Architecture

Step-by-Step Process

📸 Segmentation Pipeline Visualization

⚙️ Training Configuration

📊 Model Performance

Overall Metrics

Per-Class Performance (Table 5)

Key Findings

🔍 Classification Example

📂 Repository Structure

🚀 Quick Start

Installation

Inference Example

📚 Comparison with Related Work

🔧 Requirements

📥 Model Weights

📖 Citation

📜 License

🙏 Acknowledgments

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages