Skip to content

Commit 0c5fdf2

Browse files
author
Carsten Griwodz
committed
some more CUDA calling considerations
1 parent 936bcde commit 0c5fdf2

File tree

1 file changed

+17
-5
lines changed

1 file changed

+17
-5
lines changed

AI_DEVELOPMENT_GUIDE.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,23 @@ It ensures that contributions (from GitHub Copilot, ChatGPT, Claude, etc.) follo
2222
- Use **C++17**. Prefer `constexpr`, `auto` and `enum class`.
2323
- Use range-based for loops on the host side.
2424
- Use smart pointers (`std::unique_ptr`, `std::shared_ptr`) on the host side.
25+
- Dynamic memory allocation on the device side is strongly discouraged.
2526
- Never pass smart pointers as parameters to __global__ functions.
26-
- Avoid dynamic memory allocation on the device side.
27-
- **Memory Management**: Use RAII. Avoid raw `new`/`delete` except in CUDA contexts where unavoidable.
27+
- **Memory Management**:
28+
- Use RAII on the host side.
29+
- Avoid all dynamic memory allocation on the device side.
30+
- Understand that reference-counting smart pointers cannot be kept consistent between
31+
host and device, and that kernels run asynchronously from host code.
2832
- **Error Handling**:
2933
- Use exceptions in host C++ code.
3034
- In CUDA, check and propagate error codes using helper utilities/macros. Never ignore errors.
3135
- **Namespaces**: Group related functions/classes logically. Avoid polluting the global namespace.
3236
- **Headers**:
3337
- Keep headers minimal; forward declare instead of including heavy dependencies.
34-
- Each header should be guarded with `#pragma once`.
38+
However, small helper functions declared `static inline __device__` use several times should be
39+
included instead of copying the code.
40+
- Each header should be guarded with `#pragma once`. ifndef/endif guards should be used in special
41+
circumstances only.
3542
- **Style**:
3643
- `snake_case` for variables and functions.
3744
- `CamelCase` for class and struct names.
@@ -41,14 +48,19 @@ It ensures that contributions (from GitHub Copilot, ChatGPT, Claude, etc.) follo
4148

4249
## CUDA Guidelines
4350

44-
- Separate **kernels** from host orchestration code.
51+
- Separate **kernels** (`__global__` functions) from host orchestration code, but keep
52+
them in the same module as the host core that starts them.
4553
- Name kernels descriptively, e.g. `compute_gradient_kernel`.
4654
- Document assumptions about:
4755
- Thread/block layout
4856
- Shared memory usage
4957
- Synchronization requirements
5058
- Use `__restrict__` and `constexpr` where appropriate for performance and clarity.
51-
- Prefer small, focused kernels over overly complex ones.
59+
- Avoid writing kernels that use `local memory`, limit variables to registers and shared
60+
memory as much as possible. To achieve this, prefer focused kernels over complex ones.
61+
- To structure larger kernels, use `__device__` functions that are declared
62+
`static inline __device__`. Ensure that caller and device functions are compiled together.
63+
- Avoid dynamic parallelism.
5264
- Always validate CUDA API calls.
5365

5466
---

0 commit comments

Comments
 (0)