Skip to content

[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

Open
@shr19976

Description

@shr19976

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Problem Description

When performing sparse convolution operations (especially with stride=2), a RuntimeError: CUDA error: invalid configuration argument occurs.
The error happens during kernel map generation in torchsparse, specifically at:
torchsparse/nn/functional/conv/hash/query.py:48 when calling torch.full().

Key Observations

  1. stride=1 works, stride=2 fails​
  2. ​GPU Architecture Specificity​
    Reproduced only on Ada Lovelace (Compute Capability 8.9) GPUs. Untested on other architectures (e.g., Ampere).
    ​3. Asynchronous Error Reporting​
    Error message mentions possible asynchronous reporting, but CUDA_LAUNCH_BLOCKING=1 does not resolve the issue.

Error Log
Full Stack Trace :
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Traceback (most recent call last):
File "minimal_repro.py", line 16, in
y = conv(x)
...
File ".../torchsparse/nn/functional/conv/hash/query.py", line 48, in convert_transposed_out_in_map
out_in_map_t = torch.full(
^^^^^^^^^^^
RuntimeError: CUDA error: invalid configuration argument

​Additional Information
​Attempted Fixes :

Recompiled torchsparse with explicit GPU architecture:
export TORCH_CUDA_ARCH_LIST="8.9"
pip install --force-reinstall git+https://github.com/mit-han-lab/torchsparse.git

Set CUDA_LAUNCH_BLOCKING=1 and TORCH_USE_CUDA_DSA=1, but no resolution.

Expected Behavior

Questions for Developers
​1. Architecture Compatibility​
Is TorchSparse officially supported on Ada Lovelace (Compute Capability 8.9)?

  1. ​Stride Configuration Limitations​
    Are there known issues with stride=2 in sparse convolutions? Any special parameter requirements?

  2. ​Debugging Suggestions​
    How to further diagnose the torch.full() CUDA configuration error?

Environment

- GCC: 11.4.0
- NVCC:11.8.89
- PyTorch:2.2.0
- PyTorch CUDA:11.8
- TorchSparse:2.1.0
- GPU: NVIDIA GeForce RTX 4070 Ti SUPER (Compute Capability 8.9)
- Python: 3.11.11
- OS: WSL-Ubuntu 22.04

Anything else?

Possible Causes

  1. Mismatch Between TorchSparse and PyTorch/CUDA Versions

  2. GPU Compute Capability Not Properly Supported

  3. Lack of Recompilation from Source with Proper Architecture Flags

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions