You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: .coderabbit.yaml
+4Lines changed: 4 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -30,12 +30,16 @@ reviews:
30
30
- Memory access patterns and coalescing
31
31
- Correct use of atomicAdd and synchronization
32
32
- Template parameter correctness (float vs double)
33
+
- MANDATORY: Every kernel launch (<<<grid, block, shared, stream>>>) MUST be followed by cudaGetLastError() to catch launch failures. Flag any kernel launch missing this check.
34
+
- MANDATORY: No magic numbers. All block sizes, tile sizes, grid calculations, and thresholds must use named constants (constexpr int BLOCK_SIZE = 256). Flag any raw numeric literal in dim3, grid, or shared memory calculations.
33
35
- path: "src/rapids_singlecell/**/_kernels/**"
34
36
instructions: |
35
37
These are CuPy RawKernel definitions. Review for:
36
38
- Correct CUDA kernel launch configurations
37
39
- Shared memory bounds
38
40
- Type safety (float32 vs float64 mismatches)
41
+
- No magic numbers in kernel launch configurations or kernel code. Block sizes, tile sizes, and thresholds must use named constants.
42
+
- After RawKernel calls, check for cp.cuda.runtime.getLastError() to catch silent launch failures.
39
43
- path: "tests/**"
40
44
instructions: |
41
45
Do not suggest changing test tolerances without strong justification.
- Missing CUDA error checking after kernel launches
22
+
-**Missing `cudaGetLastError()`after kernel launches**: Every kernel launch (`<<<grid, block, shared, stream>>>`) MUST be followed by `cudaGetLastError()` to detect launch failures (invalid config, shared memory overflow, etc.). Without this, errors are silently deferred and may corrupt later operations or produce garbage results.
23
23
- Kernel launch with zero blocks/threads or invalid grid/block dimensions
24
24
-**Template type mismatches**: kernel templated on `float` but receiving `double` data from Python
25
25
-**Shared memory overflow**: exceeding device shared memory limit (varies by GPU, e.g. T4 = 64KB)
@@ -73,7 +73,7 @@
73
73
### Kernel Configuration
74
74
- Hard-coded shared memory sizes that may exceed device limits
75
75
- Fixed tile sizes that don't adapt to device capabilities
76
-
-**Magic numbers** in grid/block calculations without descriptive constants
76
+
-**Magic numbers**: all numeric literals for block sizes, tile dimensions, shared memory sizes, and heuristic thresholds MUST use named constants. `dim3 block(256)` is not acceptable — use `constexpr int BLOCK_SIZE = 256; dim3 block(BLOCK_SIZE);`
77
77
78
78
### Test Quality
79
79
- Missing validation of numerical correctness against CPU reference
@@ -141,6 +141,43 @@ int max_shared = device.attributes["MaxSharedMemoryPerBlock"];
141
141
int tile = select_tile(max_shared, dtype_size);
142
142
```
143
143
144
+
**CRITICAL** (missing cudaGetLastError):
145
+
```text
146
+
CRITICAL: Missing cudaGetLastError() after kernel launch
147
+
148
+
Issue: Kernel launched without error checking — launch failures are silently deferred
149
+
Why: Invalid grid/block config, shared memory overflow, or other launch errors go undetected
150
+
Impact: Garbage results that look like algorithm bugs, not CUDA errors
0 commit comments