[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture)

### Is there an existing issue for this?

- [x] I have searched the existing issues

### Current Behavior

**Problem Description**：

When performing sparse convolution operations (especially with **stride=2**), a **RuntimeError: CUDA error: invalid configuration argument** occurs.
The error happens during kernel map generation in **torchsparse**, specifically at:
**torchsparse/nn/functional/conv/hash/query.py:48** when calling **torch.full().**

**Key Observations** ：
1. ​**stride=1** works, **stride=2** fails​
2. ​GPU Architecture Specificity​
Reproduced only on Ada Lovelace (**Compute Capability 8.9**) GPUs. Untested on other architectures (e.g., Ampere).
​3. Asynchronous Error Reporting​
Error message mentions possible asynchronous reporting, but CUDA_LAUNCH_BLOCKING=1 does not resolve the issue.

**Error Log** 
**Full Stack Trace :**
RuntimeError: CUDA error: invalid configuration argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Traceback (most recent call last):
  File "minimal_repro.py", line 16, in <module>
    y = conv(x)
  ...
  File ".../torchsparse/nn/functional/conv/hash/query.py", line 48, in convert_transposed_out_in_map
    out_in_map_t = torch.full(
                   ^^^^^^^^^^^
RuntimeError: CUDA error: invalid configuration argument


**​Additional Information 
​Attempted Fixes ：**

Recompiled torchsparse with explicit GPU architecture:
export TORCH_CUDA_ARCH_LIST="8.9"
pip install --force-reinstall git+https://github.com/mit-han-lab/torchsparse.git

Set CUDA_LAUNCH_BLOCKING=1 and TORCH_USE_CUDA_DSA=1, but no resolution.




### Expected Behavior

Questions for Developers 
​1. Architecture Compatibility​
Is TorchSparse officially supported on Ada Lovelace (Compute Capability 8.9)?

2. ​Stride Configuration Limitations​
Are there known issues with stride=2 in sparse convolutions? Any special parameter requirements?

3. ​Debugging Suggestions​
How to further diagnose the torch.full() CUDA configuration error?



### Environment

```markdown
- GCC: 11.4.0
- NVCC:11.8.89
- PyTorch:2.2.0
- PyTorch CUDA:11.8
- TorchSparse:2.1.0
- GPU: NVIDIA GeForce RTX 4070 Ti SUPER (Compute Capability 8.9)
- Python: 3.11.11
- OS: WSL-Ubuntu 22.04
```

### Anything else?

Possible Causes 
1. Mismatch Between TorchSparse and PyTorch/CUDA Versions 

2. GPU Compute Capability Not Properly Supported

3. Lack of Recompilation from Source with Proper Architecture Flags 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

Is there an existing issue for this?

Current Behavior

Expected Behavior

Environment

Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] <title>CUDA error: invalid configuration argument when using stride=2 in sparse convolutions (RTX 40-series GPUs / Ada Lovelace Architecture) #347

Description

Is there an existing issue for this?

Current Behavior

Expected Behavior

Environment

Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions