Skip to content

Add comprehensive stride coverage tests for non-contiguous tensor inputs#761

Open
jhelferty-nv wants to merge 8 commits into
shader-slang:mainfrom
jhelferty-nv:add-pytorch-slice-test
Open

Add comprehensive stride coverage tests for non-contiguous tensor inputs#761
jhelferty-nv wants to merge 8 commits into
shader-slang:mainfrom
jhelferty-nv:add-pytorch-slice-test

Conversation

@jhelferty-nv
Copy link
Copy Markdown
Contributor

@jhelferty-nv jhelferty-nv commented Jan 27, 2026

Add tests verifying that non-contiguous PyTorch tensor views are correctly
handled when passed to Slang functions through the functional API. Covers
the marshalling and vectorization paths for multiple parameter types and
view operations.

Parameter types tested:

  • float[3] fixed-size arrays (scaled_sum)
  • float3 vectors (add_vectors)
  • Tensor<float,2> / WTensor<float,2> (add_tensors)
  • RWTensor<float,1> (copy_tensor)
  • TensorView (copy_tensorview, add_tensorview)
  • DiffTensorView (copy_difftensorview, diff_square backward)

View operations tested:

  • Prefix slices (t[:3]), offset slices (t[1:4]), strided slices (t[::2])
  • Transpose (.t()) — non-unit stride in binding dimension
  • Diagonal (.reshape(3,3).diagonal()) — stride = sum of dim strides
  • 3D permute + select (.permute(2,0,1)[0]) — both dims non-unit stride
  • Zero-stride broadcast (.expand()) — stride 0 in batch dimension

Test categories:

  • Forward correctness: GPU result matches PyTorch math
  • Write-back correctness: sentinel-based verification that only target
    positions are written
  • Gradient parity: SlangPy backward pass matches pure PyTorch autograd

Motivated by #387

Summary by CodeRabbit

Release Notes

  • New Features

    • Added scaled_sum function for scaled summation operations on value arrays.
  • Tests

    • Added comprehensive test coverage for tensor slicing, transposition, and view operations.
    • Added verification for gradient parity with sliced and transposed tensor inputs.
    • Added tests for broadcasting behavior and PyTorch interoperability scenarios.

@jhelferty-nv jhelferty-nv self-assigned this Jan 27, 2026
@jhelferty-nv jhelferty-nv requested a review from a team as a code owner January 27, 2026 21:48
@ccummingsNV
Copy link
Copy Markdown
Contributor

@jhelferty-nv the tests never passed on this one. do you want to merge it and see if you can solve it? or close PR if it's no longer relevant.

…hader-slang#761)

Test that PyTorch tensor views created by slicing can be correctly
marshalled to Slang fixed-size array parameters (float[N]), covering
both basic acceptance and gradient parity with pure PyTorch.

Made-with: Cursor
…arams

Cover the high-risk gaps in non-trivial tensor view handling:
- float3 vector params with prefix/offset/strided sliced tensors
- RWTensor write-back correctness with sliced output tensors
- Tensor<float,2> with transposed (non-contiguous) input tensors
- Gradient parity for float3 params with sliced tensor views

Made-with: Cursor
…oat[5]

- Transpose tests for float[3] and float3 params (non-unit stride in
  the trailing dimension that maps to array/vector components)
- float[5] with prefix/offset/strided slices (verifies array marshalling
  generalizes beyond the float[3] case)
- Tensor<float,2> with column-prefix, column-offset, and column-strided
  views (extends the existing transpose-only coverage)
- Gradient parity tests for both float[3] and float3 with transposed inputs

Made-with: Cursor
Cover remaining non-contiguous memory layout gaps:
- TensorView: sliced input reads, sliced output write-back, sliced add
- DiffTensorView: sliced input/output, backward pass with sliced inputs
- WTensor: transpose write-back
- numpy: non-contiguous arrays for float and float3 params
- Remove redundant test_array5_parameter_slice

Made-with: Cursor
Cover remaining medium+ risk non-contiguous memory patterns:
- Negative strides (flip): all 1D and 2D param test files
- Diagonal views (stride = sum of dims): 1D param tests
- 3D permute (both dims non-unit stride): Tensor<float,2> tests
- Zero-stride broadcast (expand): float[3], float3, Tensor<float,2>
- Numpy flip and broadcast cases

Made-with: Cursor
torch.flip() always returns a contiguous copy (PyTorch does not support
negative strides), so flip cases were just re-testing the contiguous path.
The C++ numpy marshalling layer explicitly rejects non-contiguous arrays,
so those tests were targeting a missing feature.

Made-with: Cursor
@jhelferty-nv jhelferty-nv force-pushed the add-pytorch-slice-test branch from 2f6a05a to 9a7d2be Compare March 9, 2026 19:12
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 9, 2026

📝 Walkthrough

Walkthrough

Added comprehensive test coverage for non-contiguous, sliced, and transposed tensor handling in SlangPy across four test modules. Tests verify parameter binding, gradient parity, TensorView behavior, and write-back semantics. A new Slang scaled_sum function was also introduced.

Changes

Cohort / File(s) Summary
Test Infrastructure for Sliced/Non-contiguous Tensors
slangpy/tests/slangpy_tests/test_torchintegration.py
Added 284 lines introducing slice cases (SLICE_CASES, VECTOR_SLICE_CASES, RWTENSOR_SLICE_CASES, TENSOR2D_VIEW_FACTORIES) and parametrized test functions (test_parameter_slice, test_vector_parameter_slice, test_rwtensor_slice_writeback, test_tensor_view, etc.) to verify slicing, transposition, broadcasting, and write-back behavior across device types.
DiffTensorView Slicing Tests
slangpy/tests/slangpy_tests/test_difftensorview.py
Added 77 lines of test cases for reading from and writing to sliced DiffTensorView inputs/outputs, including backward (diff_square) tests, parametrized across CUDA availability, device types, and slice scenarios (prefix, offset, strided, diagonal).
TensorView Slicing Tests
slangpy/tests/slangpy_tests/test_tensorview.py
Added 78 lines introducing TENSORVIEW_SLICE_CASES and test functions (test_tensorview_sliced_input, test_tensorview_sliced_output, test_tensorview_add_sliced) to exercise sliced/diagonal views, write-back behavior, and arithmetic operations with sliced inputs.
Gradient Parity Tests for Sliced/Transposed Tensors
slangpy/tests/slangpy_tests/test_pytorch_gradient_parity.py
Added 186 lines introducing test infrastructure (SLANG_ARRAY_DOT, ARRAY_SLICE_CASES, SLANG_VECTOR_DOT, VECTOR_GRAD_SLICE_CASES) and test functions (test_sliced_tensor_array_gradient_parity, test_sliced_vector_gradient_parity, test_transposed_array_gradient_parity, test_transposed_vector_gradient_parity) to verify gradient parity between PyTorch and SlangPy for sliced and transposed inputs.
Slang Function Addition
slangpy/tests/slangpy_tests/test_torchintegration.slang
Added 9 lines implementing new public differentiable function scaled_sum(float scale, float[3] values) that accumulates scaled vector values.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • ccummingsNV

Poem

🐰 Slices and dices, we test with delight,
Non-contiguous tensors, now handled just right!
Gradients align, and views write back true,
With strided and transposed paths tested anew.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR comprehensively addresses issue #387 by adding extensive test coverage for sliced PyTorch tensors as fixed-size array parameters across multiple test files, validating direct marshalling of tensor views without requiring cloning.
Out of Scope Changes check ✅ Passed All changes are directly related to testing sliced tensor functionality: test files validate slice scenarios, gradient parity, write-back, transpositions, and broadcasting—all supporting the core objective of enabling sliced tensor support.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title check ✅ Passed The title directly and accurately summarizes the main purpose of the changeset: adding comprehensive test coverage for non-contiguous (strided) tensor inputs across multiple test files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jhelferty-nv jhelferty-nv changed the title Add test case for pytorch slice use Add comprehensive stride coverage tests for non-contiguous tensor inputs Mar 9, 2026
@jhelferty-nv
Copy link
Copy Markdown
Contributor Author

I've taken another stab at it, starting from scratch but inspired by the previous attempt. Reusing the PR since the idea is the same, but I'm building on tests that were checked in since.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
slangpy/tests/slangpy_tests/test_difftensorview.py (1)

171-183: Exercise sliced gradient storage in this backward case.

x is a view, but x_grad is a fresh contiguous tensor. If the backward path mishandles non-contiguous gradient destinations, this test still passes. Make the gradient buffer a slice of a larger tensor and assert the untouched positions stay zero.

Suggested adjustment
-    x_grad = torch.zeros(3, device="cuda", dtype=torch.float32)
+    x_grad_full = torch.zeros(source_size, device="cuda", dtype=torch.float32)
+    x_grad = slicer(x_grad_full)
     output = torch.zeros(3, device="cuda", dtype=torch.float32)
     output_grad = torch.ones(3, device="cuda", dtype=torch.float32)

     module.diff_square.bwds(diff_pair(x, x_grad), diff_pair(output, output_grad))
     torch.cuda.synchronize()

-    expected_grad = 2.0 * x
-    assert torch.allclose(
-        x_grad, expected_grad, atol=1e-5
-    ), f"Expected grad {expected_grad}, got {x_grad}"
+    expected_full = torch.zeros_like(x_full)
+    slicer(expected_full).copy_(2.0 * x)
+    assert torch.allclose(
+        x_grad_full, expected_full, atol=1e-5
+    ), f"Expected grad {expected_full}, got {x_grad_full}"

Based on learnings: When adding autograd support for new access patterns, must update find_torch_tensors() for is_input determination, TorchAutoGradHook.backward() for gradient flow direction, and add tests.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slangpy/tests/slangpy_tests/test_difftensorview.py` around lines 171 - 183,
The test currently uses a fresh contiguous x_grad so it won't catch bugs where
backward writes only into contiguous buffers; change the test to make x_grad a
slice of a larger tensor (e.g., big_x_grad = torch.zeros(5, device="cuda");
x_grad = big_x_grad[1:4]) and after calling module.diff_square.bwds(diff_pair(x,
x_grad), diff_pair(output, output_grad)) + torch.cuda.synchronize(), assert that
big_x_grad[1:4] == expected_grad and that big_x_grad[0] and big_x_grad[4] remain
zero; also ensure analogous slicing for output_grad if needed. When adding
autograd support for new access patterns, remember to update
find_torch_tensors() (is_input determination) and TorchAutoGradHook.backward()
to handle gradient flow direction for sliced/non-contiguous destinations.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@slangpy/tests/slangpy_tests/test_torchintegration.py`:
- Around line 549-566: The test parameter `name` in
test_tensor_view(device_type, name, view_factory) is unused; either remove it
from the parametrization or reference it in the test body to silence the lint
warning—e.g., include it in an assertion or diagnostic (use the `name` string
when asserting non-contiguity or in a failure message) so the symbol `name` is
read, or delete `name` from the test signature and related parametrization
entries (affecting test_tensor_view and the parametrized test data that supplies
`name`).

---

Nitpick comments:
In `@slangpy/tests/slangpy_tests/test_difftensorview.py`:
- Around line 171-183: The test currently uses a fresh contiguous x_grad so it
won't catch bugs where backward writes only into contiguous buffers; change the
test to make x_grad a slice of a larger tensor (e.g., big_x_grad =
torch.zeros(5, device="cuda"); x_grad = big_x_grad[1:4]) and after calling
module.diff_square.bwds(diff_pair(x, x_grad), diff_pair(output, output_grad)) +
torch.cuda.synchronize(), assert that big_x_grad[1:4] == expected_grad and that
big_x_grad[0] and big_x_grad[4] remain zero; also ensure analogous slicing for
output_grad if needed. When adding autograd support for new access patterns,
remember to update find_torch_tensors() (is_input determination) and
TorchAutoGradHook.backward() to handle gradient flow direction for
sliced/non-contiguous destinations.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e8411be5-544e-44fc-bcad-8711aa998353

📥 Commits

Reviewing files that changed from the base of the PR and between 6bc582a and 9a7d2be.

📒 Files selected for processing (5)
  • slangpy/tests/slangpy_tests/test_difftensorview.py
  • slangpy/tests/slangpy_tests/test_pytorch_gradient_parity.py
  • slangpy/tests/slangpy_tests/test_tensorview.py
  • slangpy/tests/slangpy_tests/test_torchintegration.py
  • slangpy/tests/slangpy_tests/test_torchintegration.slang

Comment on lines +549 to +566
def test_tensor_view(
device_type: DeviceType,
name: str,
view_factory: Callable[[torch.device], torch.Tensor],
):
"""
Test that non-contiguous 2D tensor views work correctly when bound to
Tensor<float,2> / WTensor<float,2> parameters.

Covers transposed, column-sliced (prefix and offset), and column-strided
views, all of which produce non-contiguous memory layouts.
"""
module = load_test_module(device_type)

dev = torch.device("cuda")
a = view_factory(dev)
b = view_factory(dev)
assert not a.is_contiguous()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use name in the body or drop it from the parametrized args.

Ruff is already flagging Line 551 because name is never read. Folding it into the non-contiguity assertion keeps the readable test ids and clears the warning.

Minimal fix
-    assert not a.is_contiguous()
+    assert not a.is_contiguous(), f"{name} should create a non-contiguous view"
🧰 Tools
🪛 Ruff (0.15.4)

[warning] 551-551: Unused function argument: name

(ARG001)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@slangpy/tests/slangpy_tests/test_torchintegration.py` around lines 549 - 566,
The test parameter `name` in test_tensor_view(device_type, name, view_factory)
is unused; either remove it from the parametrization or reference it in the test
body to silence the lint warning—e.g., include it in an assertion or diagnostic
(use the `name` string when asserting non-contiguity or in a failure message) so
the symbol `name` is read, or delete `name` from the test signature and related
parametrization entries (affecting test_tensor_view and the parametrized test
data that supplies `name`).

Copy link
Copy Markdown
Contributor

@ccummingsNV ccummingsNV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the branch. Assuming tests pass, this is good to go.

@ccummingsNV ccummingsNV enabled auto-merge (squash) March 17, 2026 09:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants