Skip to content

Empty Intersection in the ChesapeakeCVPR GeoDataset w/ RandomGeoSampler #2406

Open
@arjunarao619

Description

@arjunarao619

Description

It appears that RandomGeoSampler is attempting to sample a window from the ChesapeakeCVPR dataset that is either out of bounds, or is empty. Rasterio is not able to handle this and errors out. Full stacktrace:

61502 Traceback (most recent call last):
61503   File "/media/share/share/projects/geolayers/train_baseline.py", line 432, in train
61504     for i, data in enumerate(testloader,0):
61505   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 701, in __next__
61506     data = self._next_data()
61507   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1445, in _next_data
61508     return self._process_data(data)
61509   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 1491, in _process_data
61510     data.reraise()
61511   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/torch/_utils.py", line 715, in reraise
61512     raise exception
61513 ValueError: Caught ValueError in DataLoader worker process 4.
61514 Original Traceback (most recent call last):
61515   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/rasterio/mask.py", line 80, in raster_geometry_mask
61516     window = geometry_window(dataset, shapes, pad_x=pad_x, pad_y=pad_y)
61517   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/rasterio/features.py", line 477, in geometry_window
61518     window = window.intersection(raster_window)
61519   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/rasterio/windows.py", line 775, in intersection
61520     return intersection([self, other])
61521   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/rasterio/windows.py", line 125, in wrapper
61522     return function(*args[0])
61523   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/rasterio/windows.py", line 239, in intersection
61524     return functools.reduce(_intersection, windows)
61525   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/rasterio/windows.py", line 257, in _intersection
61526     raise WindowError(f"Intersection is empty {w1} {w2}")
61527 rasterio.errors.WindowError: Intersection is empty Window(col_off=-205, row_off=6158, width=201, height=201) Window(col_off=0, row_off=0, width=4901, height=6511)
61528 
61529 During handling of the above exception, another exception occurred:
61530 
61531 Traceback (most recent call last):
61532   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/torch/utils/data/_utils/worker.py", line 351, in _worker_loop
61533     data = fetcher.fetch(index)  # type: ignore[possibly-undefined]
61534   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch
61535     data = [self.dataset[idx] for idx in possibly_batched_index]
61536   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/torch/utils/data/_utils/fetch.py", line 52, in <listcomp>
61537     data = [self.dataset[idx] for idx in possibly_batched_index]
61538   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/torchgeo/datasets/chesapeake.py", line 559, in __getitem__
61539     data, _ = rasterio.mask.mask(
61540   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/rasterio/mask.py", line 178, in mask
61541     shape_mask, transform, window = raster_geometry_mask(
61542   File "/media/share/share/envs/tgeo/lib/python3.10/site-packages/rasterio/mask.py", line 86, in raster_geometry_mask
61543     raise ValueError('Input shapes do not overlap raster.')
61544 ValueError: Input shapes do not overlap raster.

Steps to reproduce

This error is rather random – it generally can occur at any given iteration in the training process on any epoch. Here are steps to reproduce it.

  1. Create a ChesapeakeCVPR dataset:
from torchgeo.datasets import ChesapeakeCVPR

states = ['de', 'md', 'va', 'wv', 'pa', 'ny']
spl_train =  [f'{state}-train' for state in states]
spl_val = ([f'{state}-val' for state in states])
spl_test = ([f'{state}-test' for state in states])

trainset = ChesapeakeCVPR(root='/share/chesapeake/cvpr_chesapeake_landcover', download=False, cache=True, layers=modality, splits=spl_train, transforms=None)
  1. Initialize a RandomGeoSampler and dataloader
from torchgeo.samplers import RandomGeoSampler, RandomBatchGeoSampler

trainsampler = RandomGeoSampler(trainset, size=256, units=torchgeo.samplers.Units.PIXELS, generator=generator)
trainloader = torch.utils.data.DataLoader(trainset, sampler=trainsampler, batch_size=BATCH_SIZE, num_workers=cfg['num_workers'], drop_last=False, generator=generator, collate_fn=stack_samples)
  1. Iterate through the dataloader. Ideally, you should catch an exception at some point.

Version

0.7.0.dev0

Metadata

Metadata

Assignees

Labels

datasetsGeospatial or benchmark datasets

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions