OFFICIAL implementation of the paper "Physics-Aware Multimodal Urban Heat Mapping with Open Web Imagery and Mobility Data", published at the Web4Good Track at The Web Conference 2026.
AESPA (Aligned Environmental Sensing with Physics-aware Attribution) is a framework for fine-grained urban Land Surface Temperature (LST) estimation. It fuses satellite imagery, street-view panoramas, and mobility data while enforcing physical consistency constraints.
- [2026.01] π Accepted to The Web Conference 2026 (Web4Good Track)!
- [2025.12] π Code released.
- [2025.12] π Paper submitted to Web4Good 2026.
Extreme urban heat is a growing crisis, but monitoring it at the neighborhood level is challenging due to the limitations of satellite revisit cycles and resolutions. AESPA addresses this by:
- Multimodal Fusion: Combining Satellite (macro-view), Street View (micro-view/facades), and Mobility (human activity) data.
- Physics-Aware Regularization: Deriving physical proxies (vegetation, albedo, shadow) from street views to enforce monotonic consistency (e.g., more vegetation should not increase predicted temperature).
- Teacher-Student Distillation: Training a "Mobility-Aware Teacher" and distilling its knowledge into an "Imagery-Only Student," enabling deployment in cities where mobility data is unavailable.
π The training of AESPA consists of two stages: (i) Mobility-Aware Teacher Training, which fuses all modalities and learns physical proxies, and (ii) Imagery-Only Student Distillation, which learns to mimic the teacher's predictions and feature representations using only visual data.
The core components include:
- Encoders: ViT (Satellite), CLIP+MIL (Street View), GRU (Mobility).
- Fusion: Cross-feature fusion with FiLM-style conditioning.
- Physics Constraints: Auxiliary heads and loss functions for vegetation, canopy, imperviousness, albedo, and shadow.
Comparison with baseline models across 8 major U.S. metropolitan areas (MSAs):

We use data from 8 U.S. MSAs (Dallas, Washington, Miami, Boston, Seattle, Minneapolis, St. Louis, Pittsburgh) to demonstrate AESPA.
- Satellite Imagery: Web-based mapping platforms (Esri).
- Street View: Google Street View API (panoramas).
- Mobility: SafeGraph Weekly Patterns.
- Labels: US Surface Urban Heat Island database (Summer Daytime LST).
Please refer to data/readme.md for preprocessing scripts and data structure details.
- Tested OS: Linux
- Python >= 3.9
- PyTorch >= 2.0.0
- CUDA (recommended for GPU training)
-
Install PyTorch with the correct CUDA version from the PyTorch official website.
-
Install all Python dependencies:
pip install -r requirements.txt
The main dependencies include:
torch>=2.0.0transformers>=4.30.0(for CLIP and ViT models)peft>=0.4.0(for LoRA fine-tuning)tensorboard>=2.13.0(for training visualization)
Before training, set up the following environment variables or update the config files:
# Model paths (download CLIP and ViT models first)
export CLIP_MODEL_DIR="/path/to/clip-vit-base-patch16"
export VIT_MODEL_DIR="/path/to/vit-base-patch16-224"
# Data directory
export DATA_DIR="/path/to/your/data"Alternatively, you can directly edit the paths in configs/teacher_config.yaml and configs/student_config.yaml.
Please first navigate to the root directory of the project.
- Organize your data according to
data/readme.md. - Preprocess physical proxy indicators (Vegetation, Albedo, etc.) from street view imagery:
# Set DATA_DIR environment variable or use relative path
export DATA_DIR="/path/to/your/data"
python data/preprocess_proxies.py --data-dir "${DATA_DIR:-./data}" --split train
python data/preprocess_proxies.py --data-dir "${DATA_DIR:-./data}" --split val
python data/preprocess_proxies.py --data-dir "${DATA_DIR:-./data}" --split testWe provide the training script scripts/train_teacher.sh. You can train the teacher model which uses Satellite, Street View, and Mobility data:
bash scripts/train_teacher.shOr run directly with Python:
python src/main.py \
--mode teacher \
--config configs/teacher_config.yaml \
--data-dir /path/to/data \
--checkpoint-dir checkpoints/teacher \
--log-dir logs/teacher \
--device cuda:0 \
--epochs 30 \
--batch-size 16 \
--lambda-phys 0.05 \
--lambda-proxy 0.0 \
--lambda-ranking 0.1Parameters:
--mode: Training mode (teacherorstudent).--config: Path to YAML configuration file.--data-dir: Path to data directory.--checkpoint-dir: Directory to save model checkpoints.--log-dir: Directory for TensorBoard logs.--lambda-phys: Weight for physics consistency loss (default: 0.05).--lambda-ranking: Weight for day-night ranking loss (default: 0.1).
Once trained, you will find logs in logs/teacher/ and the best model in checkpoints/teacher/best.pth.
We provide the distillation script scripts/distill_student.sh. The student model learns from the frozen teacher and only uses Satellite and Street View inputs (no mobility data needed for inference):
bash scripts/distill_student.shOr run directly with Python:
python src/main.py \
--mode student \
--config configs/student_config.yaml \
--data-dir /path/to/data \
--teacher-checkpoint checkpoints/teacher/best.pth \
--checkpoint-dir checkpoints/student \
--log-dir logs/student \
--device cuda:0 \
--epochs 30 \
--batch-size 4 \
--lambda-kd 0.1 \
--lambda-fd 0.05 \
--lambda-phys 0.2 \
--lambda-proxy 0.3 \
--lambda-ranking 0.1Additional Parameters for Student:
--teacher-checkpoint: Path to the pre-trained teacher model checkpoint.--lambda-kd: Weight for Knowledge Distillation loss (logits matching).--lambda-fd: Weight for Feature Distillation loss (feature matching).- Note: Student training uses a smaller batch size (e.g., 4) because both teacher and student models are loaded into VRAM simultaneously.
We provide downloads of model weights on [Link Coming Soon].
If you find this repo helpful, please cite our paper:
@inproceedings{you2026physics,
title={Physics-Aware Multimodal Urban Heat Mapping with Open Web Imagery and Mobility Data},
author={You, Yuanyi and Zhang, Yunke and Li, Yong},
booktitle={Proceedings of the ACM Web Conference 2026},
pages={8808--8817},
year={2026}
}We appreciate the following resources:
- SafeGraph: For providing human mobility data.
- US Surface Urban Heat Island Database: For ground truth LST data.
- CLIP & ViT: For the visual backbone implementations.