Physics-Aware Multimodal Urban Heat Mapping (AESPA)

OFFICIAL implementation of the paper "Physics-Aware Multimodal Urban Heat Mapping with Open Web Imagery and Mobility Data", published at the Web4Good Track at The Web Conference 2026.

AESPA (Aligned Environmental Sensing with Physics-aware Attribution) is a framework for fine-grained urban Land Surface Temperature (LST) estimation. It fuses satellite imagery, street-view panoramas, and mobility data while enforcing physical consistency constraints.

📢 News

[2026.01] 🎉 Accepted to The Web Conference 2026 (Web4Good Track)!
[2025.12] 🚀 Code released.
[2025.12] 📝 Paper submitted to Web4Good 2026.

Introduction

Extreme urban heat is a growing crisis, but monitoring it at the neighborhood level is challenging due to the limitations of satellite revisit cycles and resolutions. AESPA addresses this by:

Multimodal Fusion: Combining Satellite (macro-view), Street View (micro-view/facades), and Mobility (human activity) data.
Physics-Aware Regularization: Deriving physical proxies (vegetation, albedo, shadow) from street views to enforce monotonic consistency (e.g., more vegetation should not increase predicted temperature).
Teacher-Student Distillation: Training a "Mobility-Aware Teacher" and distilling its knowledge into an "Imagery-Only Student," enabling deployment in cities where mobility data is unavailable.

Overall Architecture

🌟 The training of AESPA consists of two stages: (i) Mobility-Aware Teacher Training, which fuses all modalities and learns physical proxies, and (ii) Imagery-Only Student Distillation, which learns to mimic the teacher's predictions and feature representations using only visual data.

The core components include:

Encoders: ViT (Satellite), CLIP+MIL (Street View), GRU (Mobility).
Fusion: Cross-feature fusion with FiLM-style conditioning.
Physics Constraints: Auxiliary heads and loss functions for vegetation, canopy, imperviousness, albedo, and shadow.

⚖ Performance Comparison

Comparison with baseline models across 8 major U.S. metropolitan areas (MSAs):

Data

We use data from 8 U.S. MSAs (Dallas, Washington, Miami, Boston, Seattle, Minneapolis, St. Louis, Pittsburgh) to demonstrate AESPA.

Satellite Imagery: Web-based mapping platforms (Esri).
Street View: Google Street View API (panoramas).
Mobility: SafeGraph Weekly Patterns.
Labels: US Surface Urban Heat Island database (Summer Daytime LST).

Please refer to data/readme.md for preprocessing scripts and data structure details.

⚙️ Installation

Environment

Tested OS: Linux
Python >= 3.9
PyTorch >= 2.0.0
CUDA (recommended for GPU training)

Dependencies

Install PyTorch with the correct CUDA version from the PyTorch official website.
Install all Python dependencies:
```
pip install -r requirements.txt
```
The main dependencies include:
- torch>=2.0.0
- transformers>=4.30.0 (for CLIP and ViT models)
- peft>=0.4.0 (for LoRA fine-tuning)
- tensorboard>=2.13.0 (for training visualization)

Configuration

Before training, set up the following environment variables or update the config files:

# Model paths (download CLIP and ViT models first)
export CLIP_MODEL_DIR="/path/to/clip-vit-base-patch16"
export VIT_MODEL_DIR="/path/to/vit-base-patch16-224"

# Data directory
export DATA_DIR="/path/to/your/data"

Alternatively, you can directly edit the paths in configs/teacher_config.yaml and configs/student_config.yaml.

🏃 Model Training

Please first navigate to the root directory of the project.

Data Preparation

Organize your data according to data/readme.md.
Preprocess physical proxy indicators (Vegetation, Albedo, etc.) from street view imagery:

# Set DATA_DIR environment variable or use relative path
export DATA_DIR="/path/to/your/data"

python data/preprocess_proxies.py --data-dir "${DATA_DIR:-./data}" --split train
python data/preprocess_proxies.py --data-dir "${DATA_DIR:-./data}" --split val
python data/preprocess_proxies.py --data-dir "${DATA_DIR:-./data}" --split test

Stage-1: Teacher Model Training (with Mobility)

We provide the training script scripts/train_teacher.sh. You can train the teacher model which uses Satellite, Street View, and Mobility data:

bash scripts/train_teacher.sh

Or run directly with Python:

python src/main.py \
    --mode teacher \
    --config configs/teacher_config.yaml \
    --data-dir /path/to/data \
    --checkpoint-dir checkpoints/teacher \
    --log-dir logs/teacher \
    --device cuda:0 \
    --epochs 30 \
    --batch-size 16 \
    --lambda-phys 0.05 \
    --lambda-proxy 0.0 \
    --lambda-ranking 0.1

Parameters:

--mode: Training mode (teacher or student).
--config: Path to YAML configuration file.
--data-dir: Path to data directory.
--checkpoint-dir: Directory to save model checkpoints.
--log-dir: Directory for TensorBoard logs.
--lambda-phys: Weight for physics consistency loss (default: 0.05).
--lambda-ranking: Weight for day-night ranking loss (default: 0.1).

Once trained, you will find logs in logs/teacher/ and the best model in checkpoints/teacher/best.pth.

Stage-2: Student Model Distillation (Imagery Only)

We provide the distillation script scripts/distill_student.sh. The student model learns from the frozen teacher and only uses Satellite and Street View inputs (no mobility data needed for inference):

bash scripts/distill_student.sh

Or run directly with Python:

python src/main.py \
    --mode student \
    --config configs/student_config.yaml \
    --data-dir /path/to/data \
    --teacher-checkpoint checkpoints/teacher/best.pth \
    --checkpoint-dir checkpoints/student \
    --log-dir logs/student \
    --device cuda:0 \
    --epochs 30 \
    --batch-size 4 \
    --lambda-kd 0.1 \
    --lambda-fd 0.05 \
    --lambda-phys 0.2 \
    --lambda-proxy 0.3 \
    --lambda-ranking 0.1

Additional Parameters for Student:

--teacher-checkpoint: Path to the pre-trained teacher model checkpoint.
--lambda-kd: Weight for Knowledge Distillation loss (logits matching).
--lambda-fd: Weight for Feature Distillation loss (feature matching).
Note: Student training uses a smaller batch size (e.g., 4) because both teacher and student models are loaded into VRAM simultaneously.

Model Weights

We provide downloads of model weights on [Link Coming Soon].

👀 Citation

If you find this repo helpful, please cite our paper:

@inproceedings{you2026physics,
  title={Physics-Aware Multimodal Urban Heat Mapping with Open Web Imagery and Mobility Data},
  author={You, Yuanyi and Zhang, Yunke and Li, Yong},
  booktitle={Proceedings of the ACM Web Conference 2026},
  pages={8808--8817},
  year={2026}
}

🙇‍ Acknowledgement

We appreciate the following resources:

SafeGraph: For providing human mobility data.
US Surface Urban Heat Island Database: For ground truth LST data.
CLIP & ViT: For the visual backbone implementations.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
AESPA		AESPA
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Physics-Aware Multimodal Urban Heat Mapping (AESPA)

📢 News

Introduction

Overall Architecture

⚖ Performance Comparison

Data

⚙️ Installation

Environment

Dependencies

Configuration

🏃 Model Training

Data Preparation

Stage-1: Teacher Model Training (with Mobility)

Stage-2: Student Model Distillation (Imagery Only)

Model Weights

👀 Citation

🙇‍ Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Physics-Aware Multimodal Urban Heat Mapping (AESPA)

📢 News

Introduction

Overall Architecture

⚖ Performance Comparison

Data

⚙️ Installation

Environment

Dependencies

Configuration

🏃 Model Training

Data Preparation

Stage-1: Teacher Model Training (with Mobility)

Stage-2: Student Model Distillation (Imagery Only)

Model Weights

👀 Citation

🙇‍ Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages