Description
Is there an existing issue for this?
- I have searched the existing issues
Current Behavior
Problem Description:
When performing sparse convolution operations (especially with stride=2), a RuntimeError: CUDA error: invalid configuration argument occurs.
The error happens during kernel map generation in torchsparse, specifically at:
torchsparse/nn/functional/conv/hash/query.py:48 when calling torch.full().
Key Observations :
- stride=1 works, stride=2 fails
- GPU Architecture Specificity
Reproduced only on Ada Lovelace (Compute Capability 8.9) GPUs. Untested on other architectures (e.g., Ampere).
3. Asynchronous Error Reporting
Error message mentions possible asynchronous reporting, but CUDA_LAUNCH_BLOCKING=1 does not resolve the issue.
Error Log
Full Stack Trace :
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
Traceback (most recent call last):
File "minimal_repro.py", line 16, in
y = conv(x)
...
File ".../torchsparse/nn/functional/conv/hash/query.py", line 48, in convert_transposed_out_in_map
out_in_map_t = torch.full(
^^^^^^^^^^^
RuntimeError: CUDA error: invalid configuration argument
Additional Information
Attempted Fixes :
Recompiled torchsparse with explicit GPU architecture:
export TORCH_CUDA_ARCH_LIST="8.9"
pip install --force-reinstall git+https://github.com/mit-han-lab/torchsparse.git
Set CUDA_LAUNCH_BLOCKING=1 and TORCH_USE_CUDA_DSA=1, but no resolution.
Expected Behavior
Questions for Developers
1. Architecture Compatibility
Is TorchSparse officially supported on Ada Lovelace (Compute Capability 8.9)?
-
Stride Configuration Limitations
Are there known issues with stride=2 in sparse convolutions? Any special parameter requirements? -
Debugging Suggestions
How to further diagnose the torch.full() CUDA configuration error?
Environment
- GCC: 11.4.0
- NVCC:11.8.89
- PyTorch:2.2.0
- PyTorch CUDA:11.8
- TorchSparse:2.1.0
- GPU: NVIDIA GeForce RTX 4070 Ti SUPER (Compute Capability 8.9)
- Python: 3.11.11
- OS: WSL-Ubuntu 22.04
Anything else?
Possible Causes
-
Mismatch Between TorchSparse and PyTorch/CUDA Versions
-
GPU Compute Capability Not Properly Supported
-
Lack of Recompilation from Source with Proper Architecture Flags