-
Notifications
You must be signed in to change notification settings - Fork 414
Cuda Textures #1153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Cuda Textures #1153
Conversation
Signed-off-by: Mustafa Haiderbhai <[email protected]>
Signed-off-by: Mustafa Haiderbhai <[email protected]>
Signed-off-by: Mustafa Haiderbhai <[email protected]>
Signed-off-by: Mustafa Haiderbhai <[email protected]>
Signed-off-by: Mustafa Haiderbhai <[email protected]>
📝 WalkthroughWalkthroughThis pull request introduces comprehensive texture sampling support to the Warp library, spanning Python public APIs, C interop bindings, native C++ implementation with host/device resource management, and test coverage. Textures are created with metadata, lifecycle management across CPU and GPU contexts, and support 2D/3D sampling with multiple return types. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User Code
participant PyAPI as Python API<br/>(warp.Texture)
participant Interop as C Interop<br/>(ctypes)
participant Native as Native C++<br/>(texture.cpp)
participant CUDA as CUDA Runtime
participant Registry as Descriptor<br/>Registry
User->>PyAPI: Create Texture(data, dims, ...)
PyAPI->>PyAPI: Validate channels, dtype, size
alt Device context active
PyAPI->>Interop: wp_texture_create_device()
Interop->>Native: Call C function
Native->>CUDA: Create channel/array descriptor
Native->>CUDA: Upload data to CUDA array
Native->>CUDA: Create texture object
Native->>Registry: Store TextureDesc
Native-->>Interop: Return uint64 id
Interop-->>PyAPI: id
else CPU context
PyAPI->>Interop: wp_texture_create_host()
Interop->>Native: Call C function
Native->>Native: Allocate host buffer, copy data
Native->>Registry: Store TextureDesc
Native-->>Interop: Return uint64 id
Interop-->>PyAPI: id
end
PyAPI->>PyAPI: Store id in Texture instance
User->>User: Call texture2d_sample_f(texture_id, uv)
User->>Native: Invoke compiled kernel
Native->>Registry: Lookup TextureDesc by id
Native->>Native: Fetch/interpolate texel (CUDA or CPU path)
Native-->>User: Return sampled value
User->>PyAPI: Texture destroyed (garbage collection)
PyAPI->>PyAPI: __del__() invoked
alt Device context active
PyAPI->>Interop: wp_texture_destroy_device()
Interop->>Native: Call C function
Native->>CUDA: Destroy texture object & array
Native->>Registry: Remove TextureDesc
else CPU context
PyAPI->>Interop: wp_texture_destroy_host()
Interop->>Native: Call C function
Native->>Native: Free host buffer
Native->>Registry: Remove TextureDesc
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@mmacklin, thought I'd clean up my previous implementation and make a PR since I saw #1122 request it. As we discussed before, the only useful features are the filtered sampling, so I made the implementation barebones. All sampling returns floats (or vector of floats) because we assume filtering is used which is only supported for floating point return types (point sampling can be turned on, but defaults to filtered). Restricted the support to uint8 and float types only, channels 1/2/4. Added a CPU implementation to the texture produce the same result on CPU/GPU. Let me know what you think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
warp/__init__.pyi (1)
177-183: Texture lifetime: underlying array can be freed too earlyRe-exporting
Texturehere is fine, but based on the implementation inwarp/_src/types.py(lines 5654-5757) the constructor passes onlyctypes.cast(data.ptr, ...)intowp_texture_create_host/deviceand does not keep a Python reference todata. That means patterns like:tex = wp.Texture(wp.array(...), width=W, height=H, ...)will leave
tex.idpointing at freed memory as soon as the temporary array is garbage-collected after__init__returns.Recommend storing the array on the
Textureinstance (e.g.,self._data = data) so its lifetime is tied to the texture (and/or explicitly documenting a stronger ownership contract if you intentionally rely on user-managed lifetimes).
🧹 Nitpick comments (7)
warp/_src/builtins.py (1)
7315-7364: Texture sampling builtins look correct; confirm differentiability intentThe six texture sampling builtins are consistent with existing patterns (e.g., volume_sample_*): signatures match the native C++ APIs (id as uint64, uv/uvw as vec2/vec3, returning float/vec2/vec4) and relying on the default
namespace="wp::"should correctly bind towp::texture{2,3}d_sample_{f,v2,v4}.Two follow‑ups to consider:
- If these calls are not intended to be used in autodiff yet (and you haven’t wired up a backward rule), it’s safer to add
is_differentiable=Falsehere to avoid surprising “silent zero‑grad” or missing‑grad behavior when users differentiate through texture sampling.- Once you do the separate docs TODO, you might want to expand these brief docstrings to mention coordinate range (
[0,1]normalized), how addressing/filter modes are configured (via the texture object, not per call), and behavior in host‑only builds, for consistency with the Volume and Geometry sections.warp/__init__.pyi (1)
4564-4592: Texture sampling stubs look correct; consider clarifying coordinate semanticsThe six texture sampling overloads are type- and naming-consistent with the
Texturetype and the rest of the builtin surface (id asuint64,vec2f/vec3fcoords,float/vec2f/vec4foutputs). This looks good.Minor suggestion: the docstrings say “at normalized coordinates
uv/uvw”, butTexturehas anormalized_coordsflag that is only passed on the device path. To avoid confusion, you might want to phrase these as sampling at “texture coordinates” and clarify in the mainTexturedocs how that flag affects interpretation, rather than hard-coding “normalized” here (especially since this file is autogenerated and will mirror whatever you decide in the generator).warp/native/warp.h (1)
187-212: Texture create/destroy API is consistent; small const/doc nitsThe new
wp_texture_create_host/deviceand destroy functions are consistent with the rest of the native API and match howTextureis constructed from Python (type/format/address/filter layout andnormalized_coordsonly on the device path). That’s good.Two optional polish points while this ABI is still new:
- If the implementation does not write through
data_ptr, consider making itconst void*in both host and device signatures for clearer intent.- A short comment (similar to the mesh/volume ones) documenting expectations for
type,format, address/filter enums, and whetherdata_ptrmust remain valid for the texture lifetime would make this easier to reason about from the C side.warp/tests/geometry/test_texture.py (2)
24-31: Consider adding test coverage for vec2 and vec4 return types.The kernel only tests
texture2d_sample_fwhich returns a single float. Based on the related files (warp/init.pyi and warp/native/exports.h), the API also exposestexture2d_sample_v2andtexture2d_sample_v4for multi-channel textures.Additional test kernels for comprehensive coverage
@wp.kernel def sample_texture2d_v2_kernel( tex_id: wp.uint64, uvs: wp.array(dtype=wp.vec2), out: wp.array(dtype=wp.vec2), ): tid = wp.tid() out[tid] = wp.texture2d_sample_v2(tex_id, uvs[tid]) @wp.kernel def sample_texture2d_v4_kernel( tex_id: wp.uint64, uvs: wp.array(dtype=wp.vec2), out: wp.array(dtype=wp.vec4), ): tid = wp.tid() out[tid] = wp.texture2d_sample_v4(tex_id, uvs[tid])
81-137: Expand test coverage for more comprehensive validation.The current test provides basic validation but has limited coverage:
- Only one UV coordinate tested (0.49, 0.49)
- Only 2D textures tested (PR introduces 3D support per warp/native/exports.h)
- Only single-channel textures tested
- Only CLAMP addressing mode tested
places=2tolerance for linear interpolation seems low and might miss precision issuesConsider adding test cases for:
- 3D texture sampling using
texture3d_sample_f/v2/v4functions- Multiple UV coordinates including edge cases (0.0, 1.0, corners, center)
- Multi-channel textures to validate v2 and v4 return types
- Different addressing modes if supported beyond CLAMP
- Boundary conditions to ensure clamping behavior is correct
Example: Additional test for 3D textures
def test_texture3d_filtering(test, device): width, height, depth, channels = 2, 2, 2, 1 # Create 2x2x2 texture tex_data_1d = np.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0], dtype=np.float32) data_arr = wp.array(tex_data_1d, dtype=wp.float32, device=device) tex_linear = wp.Texture( data_arr, width=width, height=height, depth=depth, channels=channels, normalized_coords=True, address_mode=wp.Texture.ADDRESS_CLAMP, filter_mode=wp.Texture.FILTER_LINEAR, ) # Test 3D sampling...Also consider increasing the tolerance for linear interpolation from
places=2toplaces=5orplaces=6to catch precision issues while still allowing for reasonable floating-point variance.warp/_src/types.py (1)
5683-5706: Clarify and enforce layout assumptions forTexture.data
Texture.__init__assumesdatais a tightly packed 1D buffer of lengthwidth * height * depth * channels, but only checkslen(data)anddtype, not contiguity or full size. This works for the tests (1D arrays) but makes it easy to accidentally pass a non‑contiguous or oddly shaped array whose first dimension happens to match the product, leading to misleading errors or silently wrong uploads.Given that
wp_texture_create_host/_devicetreatdata_ptras a flat[width * height * depth * channels]blob, it would be safer to:
- Require contiguity (and optionally auto‑contiguous) before upload.
- Optionally base the size check on the total number of elements rather than
len(data)if you later want to support more natural 2D/3D shapes.For example:
Proposed tightening of validation
self.runtime = warp._src.context.runtime if data is None: return - self.device = data.device + if not data.is_contiguous: + # Ensure a tightly packed layout for CUDA array upload / CPU sampling. + data = data.contiguous() + + self.device = data.device if channels not in (1, 2, 4): raise RuntimeError("Texture channels must be 1, 2, or 4") - if len(data) != width * height * depth * channels: - raise RuntimeError("The size of the array does not match the dimensions") + # For now we only support flat buffers; this can be relaxed to `data.size` + # if/when multi-dimensional layouts are supported. + if len(data) != width * height * depth * channels: + raise RuntimeError("The size of the array does not match the dimensions")Please double‑check that all call sites indeed pass 1D contiguous arrays today; if any kernels already rely on 2D/3D shapes, the size check will need to be adjusted to match those layouts.
warp/native/texture.h (1)
61-62: Avoid host dereference of device descriptor pointers intexture_getusage
texture_get(uint64_t id)unconditionally reinterpretsidasTextureDesc*:CUDA_CALLABLE inline TextureDesc texture_get(uint64_t id) { return *(TextureDesc*)(id); }and
texture2d_sample_v4/texture3d_sample_v4call it on both host and device before checkingt.is_cpu. This is fine whenidis a host descriptor pointer (CPU textures), and on the CUDA path whereidis a device pointer. But if a CUDA texture is ever passed into a CPU kernel, the CPU implementation would attempt to dereference a device pointer and likely crash before it can branch ont.is_cpu.If the launch machinery doesn’t already guarantee “texture.device == kernel device” for CPU backends, it would be safer to:
- Use the host registry when compiling for host, e.g. under
#if !defined(__CUDA_ARCH__)calltexture_get_descriptor(id, t)instead of raw pointer dereference; or- At least assert/fail fast on host when
texture_get_descriptor(id, t)returns false, to avoid silently dereferencing arbitrary addresses.This would make the host path robust even if a device texture leaks into a CPU kernel by mistake.
Please confirm whether the kernel launch path already enforces that textures passed to a CPU kernel are always CPU-resident; if not, consider guarding the host implementation as suggested.
📜 Review details
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
build_lib.pypyproject.tomlwarp/__init__.pywarp/__init__.pyiwarp/_src/builtins.pywarp/_src/context.pywarp/_src/types.pywarp/native/builtin.hwarp/native/exports.hwarp/native/texture.cppwarp/native/texture.hwarp/native/warp.hwarp/tests/geometry/test_texture.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: StafaH
Repo: NVIDIA/warp PR: 925
File: warp/native/bvh.h:359-369
Timestamp: 2025-09-28T18:45:45.955Z
Learning: StafaH prefers to defer the BVH shared stack stride issue (hardcoded WP_BVH_BLOCK_DIM vs runtime blockDim.x) for a future update rather than implementing the dynamic shared memory solution now.
📚 Learning: 2025-10-03T19:41:10.598Z
Learnt from: shi-eric
Repo: NVIDIA/warp PR: 1000
File: .github/workflows/ci.yml:179-179
Timestamp: 2025-10-03T19:41:10.598Z
Learning: Warp uses Python's built-in unittest framework with a custom parallel test runner located at warp/thirdparty/unittest_parallel. Tests are executed via `python -m warp.tests` (or `uv run -m warp.tests`). Warp does not use pytest or pytest-related dependencies.
Applied to files:
warp/tests/geometry/test_texture.py
🧬 Code graph analysis (7)
warp/tests/geometry/test_texture.py (3)
warp/__init__.pyi (1)
texture2d_sample_f(4565-4567)warp/native/texture.h (2)
vec2(235-239)vec2(272-276)warp/tests/unittest_utils.py (1)
get_test_devices(99-133)
warp/native/texture.h (3)
warp/native/builtin.h (26)
wp(85-179)uint8(341-341)uint8(360-360)uint8(370-370)float(194-199)float(211-215)float(524-533)float(558-568)float(594-604)float(629-639)float(664-674)float(699-708)float(723-723)float(729-729)float(744-753)float(777-786)float(895-895)float(896-896)float(899-899)float(902-902)int(1309-1316)int(1318-1330)int(1408-1426)uint64(344-344)uint64(363-363)uint64(373-373)warp/native/tile.h (1)
float4(43-45)warp/native/texture.cpp (10)
void(226-235)void(237-249)texture_get_descriptor(17-25)texture_get_descriptor(17-17)texture_add_descriptor(27-27)texture_add_descriptor(27-27)texture_rem_descriptor(29-29)texture_rem_descriptor(29-29)texture_format_element_size(31-39)texture_format_element_size(31-31)
warp/__init__.pyi (1)
warp/_src/types.py (4)
Texture(5655-5758)vec2f(1324-1325)vec4f(1332-1333)vec3f(1328-1329)
warp/__init__.py (1)
warp/_src/types.py (1)
Texture(5655-5758)
warp/native/exports.h (1)
warp/__init__.pyi (6)
texture2d_sample_f(4565-4567)texture2d_sample_v2(4570-4572)texture2d_sample_v4(4575-4577)texture3d_sample_f(4580-4582)texture3d_sample_v2(4585-4587)texture3d_sample_v4(4590-4592)
warp/_src/builtins.py (2)
warp/_src/context.py (2)
add_builtin(1455-1694)value_type(514-520)warp/native/texture.h (6)
vec2(235-239)vec2(272-276)vec4(190-198)vec4(200-208)vec4(210-233)vec4(247-270)
warp/_src/types.py (2)
warp/_src/context.py (2)
context(3470-3485)is_cpu(3448-3450)warp/native/texture.cpp (4)
wp_texture_create_host(182-224)wp_texture_create_host(182-192)wp_texture_create_device(43-180)wp_texture_create_device(43-55)
🪛 Ruff (0.14.10)
warp/tests/geometry/test_texture.py
21-21: from warp.tests.unittest_utils import * used; unable to detect undefined names
(F403)
139-139: get_test_devices may be undefined, or defined from star imports
(F405)
146-146: add_function_test may be undefined, or defined from star imports
(F405)
warp/_src/builtins.py
7320-7320: uint64 may be undefined, or defined from star imports
(F405)
7320-7320: vec2 may be undefined, or defined from star imports
(F405)
7328-7328: uint64 may be undefined, or defined from star imports
(F405)
7328-7328: vec2 may be undefined, or defined from star imports
(F405)
7329-7329: vec2 may be undefined, or defined from star imports
(F405)
7336-7336: uint64 may be undefined, or defined from star imports
(F405)
7336-7336: vec2 may be undefined, or defined from star imports
(F405)
7337-7337: vec4 may be undefined, or defined from star imports
(F405)
7344-7344: uint64 may be undefined, or defined from star imports
(F405)
7344-7344: vec3 may be undefined, or defined from star imports
(F405)
7352-7352: uint64 may be undefined, or defined from star imports
(F405)
7352-7352: vec3 may be undefined, or defined from star imports
(F405)
7353-7353: vec2 may be undefined, or defined from star imports
(F405)
7360-7360: uint64 may be undefined, or defined from star imports
(F405)
7360-7360: vec3 may be undefined, or defined from star imports
(F405)
7361-7361: vec4 may be undefined, or defined from star imports
(F405)
warp/_src/types.py
5663-5663: Unused static method argument: args
(ARG004)
5663-5663: Unused static method argument: kwargs
(ARG004)
5699-5699: Avoid specifying long messages outside the exception class
(TRY003)
5702-5702: Avoid specifying long messages outside the exception class
(TRY003)
5705-5705: Avoid specifying long messages outside the exception class
(TRY003)
5745-5745: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (9)
pyproject.toml (1)
211-211: LGTM!The typo exception for
cudaMemcpy3DParmsis correct—this is the official CUDA API struct name used for 3D memory copy operations.warp/__init__.py (1)
189-189: LGTM!The Texture export is correctly placed in the "Spatial Acceleration" category and follows the established pattern for public API exports.
warp/native/builtin.h (1)
2098-2098: LGTM!The
texture.hinclude is correctly placed and enables texture sampling functionality within Warp kernels.build_lib.py (1)
436-436: LGTM!The addition of
"native/texture.cpp"to the build sources is necessary to compile the new texture management module into the core library.warp/_src/context.py (1)
4250-4277: The texture ctypes bindings are correctly aligned with native prototypes. All parameter orders, types, and counts match exactly between the C declarations inwarp/native/warp.hand the Python ctypes definitions:
wp_texture_create_host: 9 parameters (data_ptr as void*, 8 ints) returning uint64_t ✓wp_texture_create_device: 11 parameters (context, data_ptr as void*, 9 ints) returning uint64_t ✓- Both destroy functions: single uint64_t parameter ✓
Notably, the native side uses
int normalized_coords(notbool), so the Python binding withctypes.c_intis correct.warp/tests/geometry/test_texture.py (2)
34-47: LGTM: Point sampling reference implementation is correct.The implementation correctly follows the standard texture mapping formula and nearest-neighbor sampling logic with proper clamping.
50-78: LGTM: Bilinear sampling reference implementation is correct.The bilinear interpolation logic correctly computes weights and samples the four corner texels with proper clamping.
warp/native/exports.h (1)
1072-1077: Auto-generated texture sampling wrappers look correct.The six new texture sampling wrapper functions follow the established pattern in this file and are consistent with the declarations in warp/init.pyi. The parameter types, return mechanisms, and naming conventions all match expectations.
Since this file is auto-generated (per line 18), any changes should be made in the build_lib.py generator rather than directly in this file.
warp/native/texture.cpp (1)
43-249: Texture descriptor registry and host/device lifecycle look consistentThe C++ side correctly mirrors the intended Python API: creation copies or uploads data, registers a descriptor keyed by
id, and both host and device destroy paths clean up underlying memory (host malloc buffer orcudaArray_t+ deviceTextureDesc) and remove the descriptor. Error paths free intermediate CUDA resources before returning 0, so there are no obvious leaks or double‑frees in the success/failure flows.
|
Thanks but @nvtw already finished this work and it’s being reviewed. Have you spoken to him? Edit: We assigned Tobias to review this and compare notes with his version to get the best of both contributions |
|
That’s great to hear! No I haven’t spoken, but glad to hear it’s already complete internally. Feel free to compare with the internal version, but mine is probably not as mature of an implementation. We can close this PR if it’s not needed. |
|
Indeed, I have an implementation of textures in an internal MR. |
I haven't found a straightforward way. Both GitLab and GitHub have APIs that can facilitate this, but there are a lot of differences in the review UI so a one-to-one mapping doesn't seem possible. |
|
If you want me to take a look, a duplicate should suffice, I can plug it into our downstream mjwarp-renderer and test everything works as expected. But if it's not possible, I'll just wait for it to get merged in, and can try it then. |
|
I finally created a mirror MR on github to make the changes public. I hope to merge them this week but it depends on wheter the reviewers I need find time to have a look and approve: #1169 |
Description
This PR introduces new warp API that exposes CUDA textures. Based on the request from #1122, CUDA textures might be useful for specific applications.
As of CUDA 13, CUDA Textures do not provide any performance benefits in regards to memory access patterns. In modern CUDA, global memory access is optimized such that special memory such as CUDA textures do not provide the same gain they used to. per: https://docs.nvidia.com/cuda/cuda-programming-guide/02-basics/writing-cuda-kernels.html#texture-and-surface-memory
CUDA Textures only provide benefits through the free filtering (bi-linear/tri-linear) during fetch calls, as well as border handling. As such this PR focuses on implementing a lean texture API that is focused towards these benefits, rather then a general purpose texture API.
TODO:
Before your PR is "Ready for review"
__init__.pyi,docs/api_reference/,docs/language_reference/)pre-commit run -aSummary by CodeRabbit
Release Notes
Textureclass for managing texture resources with support for 2D and 3D sampling✏️ Tip: You can customize this high-level summary in your review settings.