Skip to content

torchgeo reads dataset as all zeros while rasterio can read the contents #2466

Open
@erob-archim

Description

@erob-archim

Description

Hi, I'm not exactly sure what's causing this issue, though I have a suspicion it might be related to compression of geotiffs.

When data is downloaded from the UK environment agency's LIDAR data programme, it shows in torchgeo as all zeros.

This can be rectified by running the file through gdal_merge.py - I'm unsure why.

This is not an issue when reading directly with rasterio.

Steps to reproduce

Run the following file. It downloads the relevant data,
Checking for a dataset reading all zeros, I get:

Rasterio compressed all zero: False
Rasterio uncompressed all zero: False
torchgeo compressed all zero: True
torchgeo uncompressed all zero: False
import subprocess
from pathlib import Path

import rasterio
from torchgeo.datasets import RasterDataset

subprocess.run(["wget", "https://api.agrimetrics.co.uk/tiles/collections/survey/national_lidar_programme_dsm/2023/1/SP7015?subscription-key=public", "-O", "test_file.zip"])
subprocess.run(["unzip", "test_file.zip"])

test_file = Path("DSM_SP7015_P_12740_20230402_20230404.tif")
uncompressed_file = test_file.with_suffix(".uncompressed.tif")

with rasterio.open(test_file) as src:
    rasterio_compressed = src.read()
    rasterio_compressed_all_zero = (rasterio_compressed == 0).all()

torchgeo_compressed_dataset = RasterDataset(test_file)
torchgeo_compressed = torchgeo_compressed_dataset[torchgeo_compressed_dataset.bounds]["image"]
torchgeo_compressed_all_zero = (torchgeo_compressed == 0).all()

uncompressed_file.unlink(missing_ok=True)
subprocess.run(["gdal_merge.py", "-ot", "Float32", "-of", "GTiff", "-o", str(uncompressed_file), str(test_file)])

with rasterio.open(uncompressed_file) as src:
    rasterio_uncompressed = src.read()
    rasterio_uncompressed_all_zero = (rasterio_uncompressed == 0).all()

torchgeo_uncompressed_dataset = RasterDataset(uncompressed_file)
torchgeo_uncompressed = torchgeo_uncompressed_dataset[torchgeo_uncompressed_dataset.bounds]["image"]
torchgeo_uncompressed_all_zero = (torchgeo_uncompressed == 0).all()

print(f"Rasterio compressed all zero: {rasterio_compressed_all_zero}")
print(f"Rasterio uncompressed all zero: {rasterio_uncompressed_all_zero}")
print(f"torchgeo compressed all zero: {torchgeo_compressed_all_zero}")
print(f"torchgeo uncompressed all zero: {torchgeo_uncompressed_all_zero}")

Version

ff3d087 (latest main)

Metadata

Metadata

Assignees

No one assigned

    Labels

    datasetsGeospatial or benchmark datasets

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions