CrossEarth-SAR: A SAR-Centric and Billion-Scale Geospatial Foundation Model for Domain Generalizable Semantic Segmentation
Ziqi Ye1, 2 ∗, Ziyang Gong3 ∗ † , Ning Liao3 ∗, Xiaoxing Hu4, Di Wang5, 6, Hongruixuan Chen7, Chen Huang8, Yiguo He3,
Yuru Jia9, 10, Xiaoxing Wang3, Yuan Cheng3, Haipeng Wang1 ‡, Xue Yang3 ‡, Junchi Yan3, 2 ‡
1 Fudan University, 2 Shanghai Innovation Institute, 3 Shanghai Jiao Tong University,
4 Beijing Institute of Technology, 5 Wuhan University, 6 Zhongguancun Academy,
7 The University of Tokyo, 8 Sun Yat-sen University, 9 KU Leuven, 10 KTH
∗ Equal Contribution, † Project Lead, ‡ Corresponding Author
This repository contains the official implementation of [CrossEarth-SAR: A SAR-Centric and Billion-Scale Geospatial Foundation Model for Domain Generalizable Semantic Segmentation].
- [2025/12/07] We are releasing the benchmark collection. Click here to get the datasets of benchmarks!
- [2025/12/07] The checkpoints have been uploaded and you can access them in the huggingface badges.
- [2025/12/07] The training and inference code of 22 benchmarks is released at followings.
Synthetic Aperture Radar (SAR) enables global, all-weather earth observation. However, owing to diverse imaging mechanisms, domain shifts across sensors and regions severely hinder its semantic generalization. To address this, we present CrossEarth-SAR, the first billion-scale SAR vision foundation model built upon a novel physics-guided sparse mixture-of-experts (MoE) architecture incorporating physical descriptors, explicitly designed for cross-domain semantic segmentation. To facilitate large-scale pre-training, we develop CrossEarth-SAR-200K, a weakly and fully supervised dataset that unifies public and private SAR imagery. We also introduce a benchmark suite comprising 22 sub-benchmarks across 8 distinct domain gaps, establishing the first unified standard for domain generalization semantic segmentation on SAR imagery. Extensive experiments demonstrate that CrossEarth-SAR achieves state-of-the-art results on 20 benchmarks, surpassing previous methods by over 10% mIoU on some benchmarks under multi-gap transfer. All code will be publicly available.
- CrossEarth-SAR-200K consists of three components: 37K private and 126K public SAR–optical pairs, together with 40K public SAR segmentation samples. Among them, 163K SAR images are assigned pseudo labels generated by applying CrossEarth to their paired optical images. All data are unified under the 7-class LoveDA semantic scheme.
- To the best of our knowledge, CrossEarthSAR-200K is the first large-scale SAR semantic segmentation dataset, and its size surpasses that of the widely used COCO-Stuff benchmark (164K images) for general-purpose semantic segmentation. The scale and diversity of CrossEarthSAR-200K effectively emulate real-world deployment scenarios in which SAR semantic segmentation models are applied across multiple data sources, with imagery collected from 109 regions worldwide. CrossEarthSAR-200K provides a robust foundation for training and evaluation, thereby advancing research in SAR semantic segmentation and image understanding.
- For fair evaluation of generalizability of existing models in SAR modality, we curate benchmarks based on widely-used SAR RS semantic segmentation datasets including AIR-PolSAR-Seg-2.0, DDHR-SK, FUSAR-Map, OpenEarthMap, SARBuD, and WHU-OPT-SAR, and extend them to DG settings.
- Our benchmark comprises different tasks across eight compositional domain gaps: (1) Unseen Region; (2) Unseen Polarization; (3) Unseen Complex Number; (4) Unseen Region and Polarization; (5) Unseen Region and Platform; (6) Unseen Region and Microwave Band; (7) Unseen Region, Polarization and Microwave Band; (8) Unseen Region, Platform and Microwave Band.
- Visualizations of predicted segmentation maps on several benchmarks. The experimental results demonstrate that CrossEarth-SAR possesses a strong capacity for SAR remote-sensing domain generalization (SAR RSDG), yielding semantically accurate and visually coherent segmentation predictions across diverse unseen scenarios.
- CrossEarth-SAR achieves SOTA performances on 20 evaluation benchmarks across various segmentation scenes, demonstrating strong generalizability.
conda create -n CrossEarthSAR python = 3.9 -y
conda activate CrossEarthSAR
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=12.1 -c pytorch -c nvidia -y
pip install fsspec
pip install -U openmim
mim install mmengine
mim install "mmcv==2.1.0"
pip install "mmsegmentation>=1.0.0"
pip install "mmdet>=3.0.0"
conda install xformers -c xformers
pip install numpy==1.24.4
pip install ftfy scipy prettytable matplotlib regex timm einops
pip install future tensorboardFirst, download the CPT model weights crossearthsar_vitl.pth from the huggingface in the above badges, then place the weights in the ./checkpoint/ directory.
Second, replace the dataset path in /configs/_base_/datasets/×××.py (i.e., the line path/to/datasets...) with your own directory.
Third, run the training code. (Take VV2F benchmark as an example):
python tools/train.py configs/CrossEarthSAR_dinov2/CrossEarthSAR_vv2RGB_VV2F.py --work-dir ./work_dirs/... --cfg-options load_from=./checkpoints/crossearthsar_vitl.pthFirst, use your own training weight or download the model weights from the huggingface in the above badges.
Second, run the test code. (Take VV2F benchmark as an example):
python tools/test.py configs/CrossEarthSAR_dinov2/CrossEarthSAR_vv2RGB_VV2F.py ./checkpoints/xxx.pth| Model | params (all/activated) | CrossEarth-SAR-200K val | download |
|---|---|---|---|
| ViT-S | 90M / 20M | 59.06% | crossearthsar_vits.pth |
| ViT-B | 300M / 80M | 60.79% | crossearthsar_vitb.pth |
| ViT-L | 1.3B / 300M | 62.42% | crossearthsar_vitl.pth |
- Release the paper on arXiv.
- Release the 22 SAR RSDG benchmarks.
- Release the CrossEarth-SAR-200K dataset and Continous Pre-Training code.
- Release the benchmarks fine-tuning training and inference code.
- Release the model weights with configs.
If you find CrossEarth-SAR helpful, please consider giving this repo a ⭐ and citing:
@article{gong2025crossearth,
title={Crossearth: Geospatial vision foundation model for domain generalizable remote sensing semantic segmentation},
author={Gong, Ziyang and Wei, Zhixiang and Wang, Di and Hu, Xiaoxing and Ma, Xianzheng and Chen, Hongruixuan and Jia, Yuru and Deng, Yupeng and Ji, Zhenming and Zhu, Xiangwei and others},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2025},
publisher={IEEE}
}
@article{hu2025earth,
title={Earth-adapter: Bridge the geospatial domain gaps with mixture of frequency adaptation},
author={Hu, Xiaoxing and Gong, Ziyang and Wang, Yupei and Jia, Yuru and Lin, Fei and Gao, Dexiang and An, Ke and Han, Jianhong and Sun, Zhuoran and Luo, Gen and others},
journal={arXiv preprint arXiv:2504.06220},
year={2025}
}
@article{cao2025crossearth,
title={CrossEarth-Gate: Fisher-Guided Adaptive Tuning Engine for Efficient Adaptation of Cross-Domain Remote Sensing Semantic Segmentation},
author={Cao, Shilei and Gong, Ziyang and Lin, Hehai and Liu, Yang and Cheng, Jiashun and Hu, Xiaoxing and Liang, Haoyuan and Li, Guowen and Qin, Chengwei and Cheng, Hong and others},
journal={arXiv preprint arXiv:2511.20302},
year={2025}
}






