Custom disaster-based train/test splits for xView2 dataset #2416

burakekim · 2024-11-18T16:12:37Z

XView2DistShift is a subclass of XView2 designed to modify the original train/test splits. Similar to EuroSATSpatial #2074, this class enables domain adaptation and out-of-distribution (OOD) detection experiments.

From the docstring:

This class allows for the selection of particular disasters to be used as the training set (in-domain) and test set (out-of-domain). The dataset can be split according to the disaster names specified by the user, enabling the model to train on one disaster type and evaluate on a different, out-of-domain disaster. The goal is to test the generalization ability of models trained on one disaster to perform on another.

adamjstewart · 2024-11-18T19:35:41Z

We decided on EuroSAT Spatial before, why switch to XView2 Dist Shift now? Will there be any corresponding citations for these new splits?

It would be nice to move more of the shared code in the XView2 base class so that the only thing that needs to be changed in this subclass is the URLs. How different are these datasets?

burakekim · 2024-11-18T20:29:41Z

We decided on EuroSAT Spatial before, why switch to XView2 Dist Shift now?

Spatial refers to the type of distribution shift revealed by the splits when they are rearranged. XView2, consists of multiple disasters, and the distribution shift is determined by the user's choice. The user can select any disaster as the training set and another as the test set -- which introduces varying types of distribution shifts. These shifts range from near-distribution shifts to far-distribution shifts, depending on how different the disasters in the splits are. And here, the difference is not limited to spatial factors but also includes temporal and contextual differences. That is why, Spatial would be a misleading naming for XView2

One alternative could be standardizing the naming for these subset datasets with a suffix like OOD or DistShift. What do you think?

It would be nice to move more of the shared code in the XView2 base class so that the only thing that needs to be changed in this subclass is the URLs. How different are these datasets?

They are basically the same dataset but with different splits. XView2DistShift allows users to select specific disasters for training and testing sets.

Are you suggesting we curate the filenames for all disasters as HF links and dynamically load them as training or testing sets based on input? This approach would save us from _initialize_files and _load_split_files_by_disaster_and_type, not __getitem__ and __len__

…odxbd

calebrob6 · 2025-02-01T15:42:01Z

Great dataset. highly recommend

burakekim · 2025-02-01T20:08:52Z

This is how it works:


id_ood_disaster =  [{'disaster_name': 'hurricane-matthew', 'pre-post': 'post'},
  {'disaster_name': 'mexico-earthquake', 'pre-post': 'post'}]

xview2 = XView2DistShift(root=root, 
                         split="test",
                         id_ood_disaster=id_ood_disaster)

> ID sample len: 311, OOD sample len: 159

All the existing methods are revised to make XView2DistShift work as it should. I cannot seem to find a way to prune further (unless I upstream a method or two to XView2 but that is too much of refactoring).

If it looks good, I can go ahead with unit tests.

@adamjstewart, just to loop you in: As you may have noticed, we (cc: @calebrob6) are upstreaming some modifications to existing datasets to make them suitable for assessing models under controlled domain shifts. This unlocks a whole new research dimension in TG, enabling users to explore robustness, generalization ability, anomaly detection, novelty detection, OOD detection and more. If it reaches a certain level of maturity, I could even consider spinning it off as a standalone toolkit that also involves methods like our recent OOD detector!

.gitignore

torchgeo/datasets/xview.py

Adam for the win Co-authored-by: Adam J. Stewart <[email protected]>

Co-authored-by: Adam J. Stewart <[email protected]>

burakekim · 2025-04-20T13:54:10Z

Better now. Thanks for the review @adamjstewart!

torchgeo/datasets/xview.py

…odxbd

.gitignore

torchgeo/datasets/xview.py

…odxbd

adamjstewart

Still a lot of concerns about how id/ood arguments are handled.

adamjstewart · 2025-05-11T09:15:35Z

tests/datasets/test_xview.py

+class TestXView2DistShift:
+    @pytest.fixture(params=['train', 'test'])
+    def dataset(self, monkeypatch: MonkeyPatch, request: SubRequest) -> XView2DistShift:
+        monkeypatch.setattr(


Might not even need to monkeypatch this if you remove checksum=True, I never bother with this

adamjstewart · 2025-05-11T09:15:54Z

tests/datasets/test_xview.py

+            split=split,
+            id_ood_disaster=[
+                {'disaster_name': 'hurricane-harvey', 'pre-post': 'post'},
+                {'disaster_name': 'hurricane-harvey', 'pre-post': 'post'},


Is this duplication required for some reason?

adamjstewart · 2025-05-11T09:16:26Z

tests/datasets/test_xview.py

+        assert set(torch.unique(x['mask']).tolist()).issubset({0, 1})  # binary mask
+
+    def test_len(self, dataset: XView2DistShift) -> None:
+        assert len(dataset) > 0


Let's test a specific length to ensure it behaves as expected. Can use an if-statement if it's different for train and test.

adamjstewart · 2025-05-11T09:16:57Z

tests/datasets/test_xview.py

+            ValueError, match="Each disaster entry must contain a 'disaster_name' key."
+        ):
+            XView2DistShift(
+                root='tests/data/xview2',


Suggested change

root='tests/data/xview2',

root=os.path.join('tests', 'data', 'xview2'),

Windows

adamjstewart · 2025-05-11T09:18:26Z

torchgeo/datasets/xview.py

+        Args:
+            root: Root directory where the dataset is located.
+            split: One of "train" or "test".
+            id_ood_disaster: List containing in-distribution and out-of-distribution disaster names.


Needs more description. Clarify that both disaster_name and pre-post are required. Also explain what they mean, I have no idea

adamjstewart · 2025-05-11T09:19:32Z

torchgeo/datasets/xview.py

+        )
+
+        for disaster in id_ood_disaster:
+            if 'disaster_name' not in disaster:


This check is duplicated below...

adamjstewart · 2025-05-11T09:19:50Z

torchgeo/datasets/xview.py

+            AssertionError: If *split* is invalid.
+            ValueError: If a disaster name in *id_ood_disaster* is not one of the valid disasters.


Seems like many other undocumented cases where these are raised

adamjstewart · 2025-05-11T09:24:32Z

torchgeo/datasets/xview.py

+        # Split logic by disaster and pre-post type
+        self.split_files: dict[str, list[dict[str, str]]] = (
+            self._load_split_files_by_disaster_and_type(
+                self.all_files, id_ood_disaster[0], id_ood_disaster[1]


If I pass in a dict of length 100, are the later 98 arguments ignored? Seems like a dict is a bad choice for this. Why not use 4 different parameters so they can be more clearly documented and type checked?

burakekim and others added 6 commits April 9, 2024 15:15

minor typo in custom_raster_dataset.ipynb

da2399b

Merge branch 'main' of https://github.com/burakekim/torchgeo

9791f12

xview2 dist shift initial commit

0f57ecf

xview2distshift dataset

62919bf

test xview2

5985f44

formatting

a23344e

github-actions bot added documentation Improvements or additions to documentation datasets Geospatial or benchmark datasets testing Continuous integration testing labels Nov 18, 2024

adamjstewart modified the milestones: 0.6.2, 0.7.0 Nov 18, 2024

burakekim and others added 5 commits January 3, 2025 20:03

Merge branch 'microsoft:main' into main

8072f2b

Merge branch 'main' into oodxbd

dc97c66

Merge branch 'main' into oodxbd

459704c

" to '

8239d32

Merge branch 'oodxbd' of https://github.com/burakekim/torchgeo into o…

c45bb25

…odxbd

burakekim added 2 commits February 1, 2025 19:41

no cuda yes docstring

0f3ceb7

id ood length method and polishing

a74c99f

adamjstewart removed this from the 0.7.0 milestone Mar 23, 2025

burakekim and others added 5 commits April 18, 2025 20:40

idk what this is

b66c792

ruff fixes

5c2a6e6

make mypy happy

ae33510

Merge branch 'microsoft:main' into oodxbd

f587d9a

make ruff happy

7d2dbbb

github-actions bot removed the testing Continuous integration testing label Apr 19, 2025

adamjstewart added this to the 0.8.0 milestone Apr 20, 2025

adamjstewart requested changes Apr 20, 2025

View reviewed changes

burakekim and others added 5 commits April 20, 2025 14:18

Update torchgeo/datasets/xview.py

4cc0ac0

Adam for the win Co-authored-by: Adam J. Stewart <[email protected]>

Update torchgeo/datasets/xview.py

6d4ac9e

Co-authored-by: Adam J. Stewart <[email protected]>

Update torchgeo/datasets/xview.py

6e32b6b

Co-authored-by: Adam J. Stewart <[email protected]>

post review changes

dd5e626

smol fixes

b73de3c

burakekim added 5 commits April 20, 2025 14:04

ruff again

0150c32

mypy fixes

b7296e7

mypy fix

83c82cf

ruff reformat

a273d5b

mypy ignore failing

3a812ce

burakekim requested a review from adamjstewart April 22, 2025 10:38

Merge branch 'main' into oodxbd

34a5687

calebrob6 reviewed Apr 23, 2025

View reviewed changes

torchgeo/datasets/xview.py Outdated Show resolved Hide resolved

torchgeo/datasets/xview.py Outdated Show resolved Hide resolved

burakekim and others added 4 commits April 23, 2025 20:52

caleb review

b040659

Merge branch 'oodxbd' of https://github.com/burakekim/torchgeo into o…

7b947c8

…odxbd

Merge branch 'main' into oodxbd

d442682

Merge branch 'main' into oodxbd

50cd637

adamjstewart requested changes Apr 30, 2025

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

torchgeo/datasets/xview.py Outdated Show resolved Hide resolved

torchgeo/datasets/xview.py Outdated Show resolved Hide resolved

torchgeo/datasets/xview.py Outdated Show resolved Hide resolved

torchgeo/datasets/xview.py Show resolved Hide resolved

covering edge scenarios per Adam's comments

16d6293

burakekim force-pushed the oodxbd branch from a829652 to 16d6293 Compare May 6, 2025 15:20

burakekim requested a review from adamjstewart May 6, 2025 15:27

burakekim and others added 5 commits May 6, 2025 17:28

Merge branch 'main' into oodxbd

103d7cf

fix valueerror match + prettier

8bb52f6

Merge branch 'oodxbd' of https://github.com/burakekim/torchgeo into o…

f08cbbb

…odxbd

improved test cov

c333a8e

Merge branch 'main' into oodxbd

ec1b8ae

adamjstewart requested changes May 11, 2025

View reviewed changes

	root='tests/data/xview2',
	root=os.path.join('tests', 'data', 'xview2'),

		AssertionError: If split is invalid.
		ValueError: If a disaster name in id_ood_disaster is not one of the valid disasters.

Custom disaster-based train/test splits for xView2 dataset #2416

Are you sure you want to change the base?

Custom disaster-based train/test splits for xView2 dataset #2416

Uh oh!

Conversation

burakekim commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adamjstewart commented Nov 18, 2024

Uh oh!

burakekim commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

calebrob6 commented Feb 1, 2025

Uh oh!

burakekim commented Feb 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

burakekim commented Apr 20, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adamjstewart left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

burakekim commented Nov 18, 2024 •

edited

Loading

burakekim commented Nov 18, 2024 •

edited

Loading

burakekim commented Feb 1, 2025 •

edited

Loading