✨ Python bindings for CudaCogReader #58

weiji14 · 2025-10-28T01:58:12Z

Allow Python users to use the nvTIFF backend, reading a GeoTIFF into a DLPack that can be consumed by CuPy, Pytorch (2.9+), or any other Python library that implements the Python Specification for DLPack.

Preview at https://cog3pio--58.org.readthedocs.build/en/58/api/#dlpack

Usage:

import cupy as cp
from cog3pio import CudaCogReader

# need to call CudaCogReader twice to workaround some CUDA stream issues
_cog = CudaCogReader(
   path="https://github.com/rasterio/rasterio/raw/1.4.3/tests/data/RGBA.byte.tif"
)
cog = CudaCogReader(
   path="https://github.com/rasterio/rasterio/raw/1.4.3/tests/data/RGBA.byte.tif"
)
array: cp.ndarray = cp.from_dlpack(cog)
array.shape
# (2271752,)
array.dtype
# dtype('uint8')

TODO:

External things to do to improve implementation here:

Implement Error trait on nvtiff_sys::result::NvTiffError
Make a cupy-xarray 0.1.5 release without cupy as dependency so that cupy-cuda13x can be used instead

⚠️ Limitations/Notes:

Initial implementation has correct 1D shape in cupy 🎉 But tensor values are nonsense, possibly due to raw pointer not being passed around safely 🙈 Workaround (discovered accidentally) is to instantiate the CudaCogReader class instance twice?! 😅
Need to figure out the wheel build situation. Do I create two packages for the Linux builds, one without nvTIFF, and one with nvTIFF bindings bundled? Yes, will need to, otherwise users will still hit into ImportError: libnvtiff.so.0: cannot open shared object file: No such file or directory when running import cog3pio without nvTIFF installed (e.g. if they don't have a CUDA GPU). Might need to look into https://wheelnext.dev/proposals/pepxxx_wheel_variant_support/. Temporary workaround might be to have the nvTIFF support added to the free-threaded builds only?

References:

Follow-up to #57, part of #26

Setting up boilerplate code for using CudaCogReader in Python to read GeoTIFFs into a DLPack and then transfer to CuPy! Temporarily using unsafe impl Send and Sync to workaround raw pointer in CudaCogReader not being able to be shared between threads safely. Still need to properly handle some kwargs in __dlpack__ for cupy.from_dlpack(). A cupy array is returned with the correct shape, but numbers are wrong, possibly because pointer isn't managed properly.

Using bundled bindings to get compilation to work on CI. Tests with cuda should still be skipped since GPU CI is not available. Also bumped from cudarc 0.17.3 to 0.17.4.

Since nvTIFF isn't on osx, and can't be bothered with Windows yet. Do `maturin build --features cuda` on Linux CI tests (Python) only.

CudaCogReader might not be available on some platforms, so hide it behind some gates.

For some reason, calling CudaCogReader twice makes things work, i.e. the returned cupy.ndarray has the correct numbers. Thinking it might be some CUDA stream issue (https://docs.cupy.dev/en/v13.6.0/user_guide/basic.html#current-stream), but cupy should already be using the default null stream by default. Putting some print() and dbg!() statements here and there. Bumped cupy-cuda12x to cupy-cuda13x.

Install nvTIFF binaries from nvidia repos following instructions on https://developer.nvidia.com/nvtiff-0-5-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network. Could have tried to get it from PyPI following https://docs.nvidia.com/cuda/nvtiff/installation.html#pypi, but then would need to figure out the lib paths and stuff.

Xref https://developer.nvidia.com/nvtiff-0-5-0-download-archive?target_os=Linux&target_arch=arm64-sbsa&Compilation=Native&Distribution=Ubuntu&target_version=24.04&target_type=deb_network

Fix `Unable to find libclang: "couldn't find any valid shared libraries matching: ['libclang.so', 'libclang-*.so', 'libclang.so.*', 'libclang-*.so.*'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"`. Need to install this inside the manylinux_2_28 docker container.

Depending on which manylinux_2_28 docker image is pulled for each target arch, the underlying distribution could either be AlmaLinux or Ubuntu based, so need to handle either way of installing nvTIFF and clang-dev.

Fix nvtiff-sys compilation errors by installing missing CUDA runtime dependencies (cuda-crt and cuda-cudart-devel) and patching the nvtiff.h file following https://docs.rs/nvtiff-sys/0.1.2/nvtiff_sys/#instructions

Default `pyo3` flag set in pyproject.toml is overidden when passing `--features` flag to maturin build, so need to set `cuda,pyo3` instead. Also copy code from e75d171 to free-threaded build section.

Too tricky to get nvTIFF working on armv7, s390x and ppc64le due to some linker error like `/usr/armv7-unknown-linux-gnueabihf/lib/gcc/armv7-unknown-linux-gnueabihf/7.5.0/../../../../armv7-unknown-linux-gnueabihf/bin/ld: cannot find -lnvtiff`, so disabling them on those platforms.

Need to include 'cuda' feature flag to maturin on ReadtheDocs, and get libnvtiff-dev from conda-forge. Added a warning to the docstring indicating that CudaCogReader is experimental, and only available on linux-x86_64 and linux-aarch64 builds.

Point to where the header files are located. nvtiff.h is in $CONDA_PREFIX/include. cuda_runtime.h and crt/host_config.h are in $CONDA_PREFIX/targets/x86_64-linux/include.

Not sure if raw pointer in CudaCogReader is thread-safe enough to do `unsafe impl Send/Sync`, so using unsendable instead for now. Xref https://pyo3.rs/v0.27.1/migration.html#pyclass-structs-must-now-be-send-or-unsendable

Check that stream and max_version arguments are valid. Currently only supporting stream=1 or None, and DLPack version 1.x (dlpark is using DLPack 1.1). Have added some docstrings for these parameters. Not implementing copy kwarg yet though.

Bump nvtiff-sys from 0.1.2 to 0.1.3 to get Error trait on NvtiffStatusError, and then we can cast to string and pass error message to PyValueError.

Brute force symlinking to get nvtiff-sys to compile with conda-forge's libnvtiff that is under $CONDA_PREFIX/include/ instead of $CONDA_PREFIX/targets/x86_64-linux/include/ where most other header files are. One key part is to use RUSTFLAGS instead of LD_LIBRARY_PATH to actually get rustc to search the correct lib/ folder for the .so files.

weiji14 added this to the 0.1.0 milestone Oct 28, 2025

weiji14 self-assigned this Oct 28, 2025

weiji14 added the feature New feature or request label Oct 28, 2025

weiji14 force-pushed the dlpack_to_cupy branch from 682fae0 to 58cdee2 Compare October 28, 2025 02:00

weiji14 force-pushed the dlpack_to_cupy branch from 58cdee2 to 6499ec9 Compare October 28, 2025 02:07

weiji14 added 3 commits October 28, 2025 17:27

🚩 Change to use cudarc's cuda-13000 feature flag

0865c25

Using bundled bindings to get compilation to work on CI. Tests with cuda should still be skipped since GPU CI is not available. Also bumped from cudarc 0.17.3 to 0.17.4.

🚩 Maturin build with 'cuda' on Linux CI only

65dc9ec

Since nvTIFF isn't on osx, and can't be bothered with Windows yet. Do `maturin build --features cuda` on Linux CI tests (Python) only.

🐛 Better handle CudaCogReader import logic

517f942

CudaCogReader might not be available on some platforms, so hide it behind some gates.

weiji14 mentioned this pull request Oct 29, 2025

EPIC: GPU-decoding from TIFF -> CUDA mem -> dlpack -> CuPy #26

Open

3 tasks

weiji14 added 2 commits October 29, 2025 16:06

🔀 Merge branch 'main' into dlpack_to_cupy

79b6555

weiji14 force-pushed the dlpack_to_cupy branch from 76e29f1 to aed13e5 Compare October 29, 2025 23:18

weiji14 force-pushed the dlpack_to_cupy branch from aed13e5 to cb05982 Compare October 29, 2025 23:20

🔧 Modify cuda repo for aarch64

bd1c8b0

Xref https://developer.nvidia.com/nvtiff-0-5-0-download-archive?target_os=Linux&target_arch=arm64-sbsa&Compilation=Native&Distribution=Ubuntu&target_version=24.04&target_type=deb_network

weiji14 force-pushed the dlpack_to_cupy branch 3 times, most recently from 5e0408c to 0056bbf Compare October 30, 2025 00:17

weiji14 force-pushed the dlpack_to_cupy branch from 0056bbf to a409689 Compare October 30, 2025 00:20

weiji14 added 5 commits October 30, 2025 14:42

🍻 Install nvTIFF and clang-dev with either dnf or apt

4e2e0ab

Depending on which manylinux_2_28 docker image is pulled for each target arch, the underlying distribution could either be AlmaLinux or Ubuntu based, so need to handle either way of installing nvTIFF and clang-dev.

🩹 Install cuda deps and patch nvtiff.h file

e75d171

Fix nvtiff-sys compilation errors by installing missing CUDA runtime dependencies (cuda-crt and cuda-cudart-devel) and patching the nvtiff.h file following https://docs.rs/nvtiff-sys/0.1.2/nvtiff_sys/#instructions

🐛 Patch to build wheels with cuda and pyo3 feature flags

bdaaa9a

Default `pyo3` flag set in pyproject.toml is overidden when passing `--features` flag to maturin build, so need to set `cuda,pyo3` instead. Also copy code from e75d171 to free-threaded build section.

🔀 Merge branch 'main' into dlpack_to_cupy

4ad7078

weiji14 force-pushed the dlpack_to_cupy branch 4 times, most recently from 715ca46 to 1c5965f Compare November 4, 2025 01:47

weiji14 force-pushed the dlpack_to_cupy branch 18 times, most recently from f0d0e16 to 1c6108a Compare November 4, 2025 03:53

📝 Add CudaCogReader class to API docs

b387902

Need to include 'cuda' feature flag to maturin on ReadtheDocs, and get libnvtiff-dev from conda-forge. Added a warning to the docstring indicating that CudaCogReader is experimental, and only available on linux-x86_64 and linux-aarch64 builds.

weiji14 force-pushed the dlpack_to_cupy branch 2 times, most recently from 4c8a4b2 to ae17b83 Compare November 4, 2025 05:57

🍻 Pass include dir to LD_LIBRARY_PATH and BINDGEN_EXTRA_CLANG_ARGS

6cc193b

Point to where the header files are located. nvtiff.h is in $CONDA_PREFIX/include. cuda_runtime.h and crt/host_config.h are in $CONDA_PREFIX/targets/x86_64-linux/include.

weiji14 force-pushed the dlpack_to_cupy branch from ae17b83 to 6cc193b Compare November 4, 2025 06:38

weiji14 added 4 commits November 5, 2025 11:13

🚨 Use pyclass(unsendable) instead of deriving Send/Sync

6ed6ceb

Not sure if raw pointer in CudaCogReader is thread-safe enough to do `unsafe impl Send/Sync`, so using unsendable instead for now. Xref https://pyo3.rs/v0.27.1/migration.html#pyclass-structs-must-now-be-send-or-unsendable

🥅 Remove unwrap, use map_err instead

1564f82

Bump nvtiff-sys from 0.1.2 to 0.1.3 to get Error trait on NvtiffStatusError, and then we can cast to string and pass error message to PyValueError.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

✨ Python bindings for CudaCogReader #58

✨ Python bindings for CudaCogReader #58

Uh oh!

weiji14 commented Oct 28, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

✨ Python bindings for CudaCogReader #58

Are you sure you want to change the base?

✨ Python bindings for CudaCogReader #58

Uh oh!

Conversation

weiji14 commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weiji14 commented Oct 28, 2025 •

edited

Loading