Skip to content

Conversation

@weiji14
Copy link
Owner

@weiji14 weiji14 commented Oct 28, 2025

Allow Python users to use the nvTIFF backend, reading a GeoTIFF into a DLPack that can be consumed by CuPy, Pytorch (2.9+), or any other Python library that implements the Python Specification for DLPack.

Preview at https://cog3pio--58.org.readthedocs.build/en/58/api/#dlpack

Usage:

import cupy as cp
from cog3pio import CudaCogReader

# need to call CudaCogReader twice to workaround some CUDA stream issues
_cog = CudaCogReader(
   path="https://github.com/rasterio/rasterio/raw/1.4.3/tests/data/RGBA.byte.tif"
)
cog = CudaCogReader(
   path="https://github.com/rasterio/rasterio/raw/1.4.3/tests/data/RGBA.byte.tif"
)
array: cp.ndarray = cp.from_dlpack(cog)
array.shape
# (2271752,)
array.dtype
# dtype('uint8')

TODO:

  • Initial boilerplate implementation
  • Handle optional 'cuda' extras dependency properly on CI
    • GitHub Actions (linux-x86_64 and linux-aarch64 only)
    • readthedocs
  • Pass raw pointer safely? (somewhat done in 6ed6ceb, but need to follow up on this)
  • Handle __dlpack__ kwargs for cupy.from_dlpack()
    • stream
    • max_version
      - [ ] dl_device (TODO next time)
      - [ ] copy (TODO next time)
  • Remove unwraps, proper error messaging
  • Resolve memory leak, or use of invalid stream somehow
  • Add benchmark test, but gate behind 'cuda' 🚩
  • Xarray integration (TODO in separate PR?)

External things to do to improve implementation here:

  • Implement Error trait on nvtiff_sys::result::NvTiffError
  • Make a cupy-xarray 0.1.5 release without cupy as dependency so that cupy-cuda13x can be used instead

⚠️ Limitations/Notes:

  • Initial implementation has correct 1D shape in cupy 🎉 But tensor values are nonsense, possibly due to raw pointer not being passed around safely 🙈 Workaround (discovered accidentally) is to instantiate the CudaCogReader class instance twice?! 😅
  • Need to figure out the wheel build situation. Do I create two packages for the Linux builds, one without nvTIFF, and one with nvTIFF bindings bundled? Yes, will need to, otherwise users will still hit into ImportError: libnvtiff.so.0: cannot open shared object file: No such file or directory when running import cog3pio without nvTIFF installed (e.g. if they don't have a CUDA GPU). Might need to look into https://wheelnext.dev/proposals/pepxxx_wheel_variant_support/. Temporary workaround might be to have the nvTIFF support added to the free-threaded builds only?

References:

Follow-up to #57, part of #26

@weiji14 weiji14 added this to the 0.1.0 milestone Oct 28, 2025
@weiji14 weiji14 self-assigned this Oct 28, 2025
@weiji14 weiji14 added the feature New feature or request label Oct 28, 2025
Setting up boilerplate code for using CudaCogReader in Python to read GeoTIFFs into a DLPack and then transfer to CuPy! Temporarily using unsafe impl Send and Sync to workaround raw pointer in CudaCogReader not being able to be shared between threads safely. Still need to properly handle some kwargs in __dlpack__ for cupy.from_dlpack(). A cupy array is returned with the correct shape, but numbers are wrong, possibly because pointer isn't managed properly.
Using bundled bindings to get compilation to work on CI. Tests with cuda should still be skipped since GPU CI is not available. Also bumped from cudarc 0.17.3 to 0.17.4.
Since nvTIFF isn't on osx, and can't be bothered with Windows yet. Do `maturin build --features cuda` on Linux CI tests (Python) only.
CudaCogReader might not be available on some platforms, so hide it behind some gates.
For some reason, calling CudaCogReader twice makes things work, i.e. the returned cupy.ndarray has the correct numbers. Thinking it might be some CUDA stream issue (https://docs.cupy.dev/en/v13.6.0/user_guide/basic.html#current-stream), but cupy should already be using the default null stream by default.

Putting some print() and dbg!() statements here and there. Bumped cupy-cuda12x to cupy-cuda13x.
@weiji14 weiji14 force-pushed the dlpack_to_cupy branch 3 times, most recently from 5e0408c to 0056bbf Compare October 30, 2025 00:17
Fix `Unable to find libclang: "couldn't find any valid shared libraries matching: ['libclang.so', 'libclang-*.so', 'libclang.so.*', 'libclang-*.so.*'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"`. Need to install this inside the manylinux_2_28 docker container.
Depending on which manylinux_2_28 docker image is pulled for each target arch, the underlying distribution could either be AlmaLinux or Ubuntu based, so need to handle either way of installing nvTIFF and clang-dev.
Fix nvtiff-sys compilation errors by installing missing CUDA runtime dependencies (cuda-crt and cuda-cudart-devel) and patching the nvtiff.h file following https://docs.rs/nvtiff-sys/0.1.2/nvtiff_sys/#instructions
Default `pyo3` flag set in pyproject.toml is overidden when passing `--features` flag to maturin build, so need to set `cuda,pyo3` instead. Also copy code from e75d171 to free-threaded build section.
Too tricky to get nvTIFF working on armv7, s390x and ppc64le due to some linker error like `/usr/armv7-unknown-linux-gnueabihf/lib/gcc/armv7-unknown-linux-gnueabihf/7.5.0/../../../../armv7-unknown-linux-gnueabihf/bin/ld: cannot find -lnvtiff`, so disabling them on those platforms.
@weiji14 weiji14 force-pushed the dlpack_to_cupy branch 4 times, most recently from 715ca46 to 1c5965f Compare November 4, 2025 01:47
@weiji14 weiji14 force-pushed the dlpack_to_cupy branch 18 times, most recently from f0d0e16 to 1c6108a Compare November 4, 2025 03:53
Need to include 'cuda' feature flag to maturin on ReadtheDocs, and get libnvtiff-dev from conda-forge. Added a warning to the docstring indicating that CudaCogReader is experimental, and only available on linux-x86_64 and linux-aarch64 builds.
@weiji14 weiji14 force-pushed the dlpack_to_cupy branch 2 times, most recently from 4c8a4b2 to ae17b83 Compare November 4, 2025 05:57
Point to where the header files are located. nvtiff.h is in $CONDA_PREFIX/include. cuda_runtime.h and crt/host_config.h are in $CONDA_PREFIX/targets/x86_64-linux/include.
Not sure if raw pointer in CudaCogReader is thread-safe enough to do `unsafe impl Send/Sync`, so using unsendable instead for now. Xref https://pyo3.rs/v0.27.1/migration.html#pyclass-structs-must-now-be-send-or-unsendable
Check that stream and max_version arguments are valid. Currently only supporting stream=1 or None, and DLPack version 1.x (dlpark is using DLPack 1.1). Have added some docstrings for these parameters. Not implementing copy kwarg yet though.
Bump nvtiff-sys from 0.1.2 to 0.1.3 to get Error trait on NvtiffStatusError, and then we can cast to string and pass error message to PyValueError.
Brute force symlinking to get nvtiff-sys to compile with conda-forge's libnvtiff that is under $CONDA_PREFIX/include/ instead of $CONDA_PREFIX/targets/x86_64-linux/include/ where most other header files are. One key part is to use RUSTFLAGS instead of LD_LIBRARY_PATH to actually get rustc to search the correct lib/ folder for the .so files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant