This directory contains scripts to extract embeddings from various foundation models for the FTW dataset.
-
Clone the Galileo repository:
git clone git@github.com:nasaharvest/galileo.git
-
Install required dependencies (follow the installation instructions in the ftw-baselines and Galileo repositories).
-
Download model checkpoints for each model you want to use. Follow each model's documentation:
- CROMA: Download checkpoint to
ckpt_base_dir/croma - DeCur: Download checkpoint to
ckpt_base_dir/decur - DOFA: Download checkpoint to
ckpt_base_dir/dofa - Galileo: Download checkpoint to
ckpt_base_dir/base - Prithvi: Download checkpoint to
ckpt_base_dir/prithvi - Satlas: Download checkpoint to
ckpt_base_dir/satlas - SoftCon: Download checkpoint to
ckpt_base_dir/softcon
- CROMA: Download checkpoint to
Before running the scripts, update the paths in the if __name__ == "__main__" block of each script:
-
base_dir: Path to your FTW-Dataset directory
base_dir = Path("/path/to/FTW-Dataset")
-
ckpt_base_dir: Path to your baseline models directory
ckpt_base_dir = Path("/path/to/baseline_models")
Run any of the extraction scripts:
python extract_ftw_croma_emb.py
python extract_ftw_decur_emb.py
python extract_ftw_dofa_emb.py
python extract_ftw_galileo_emb.py
python extract_ftw_prithvi_emb.py
python extract_ftw_satlas_emb.py
python extract_ftw_softcon_emb.pyEach script will:
- Process all countries from
ftw_tools.settings.ALL_COUNTRIES - Extract embeddings from Sentinel-2 images in
window_aandwindow_bdirectories - Save embeddings as
.npyfiles in the corresponding output directories
Embeddings are saved in directories parallel to the input structure:
- Input:
FTW-Dataset/ftw/{country}/s2_images/{window}/ - Output:
FTW-{MODEL}-Embeddings/ftw/{country}/{model_name}/{window}/
- The scripts automatically skip files that have already been processed
- Failed files will print error messages without stopping the entire process
- Make sure you have sufficient disk space for the embedding files