Open
Description
For nvptx-none
toolchain testing, we're using nvptx-none-run
to launch kernels on a 1 x 1 x 1 grid with 1 x 1 x 1 threads. We'd like to use cuCtxSetLimit(CU_LIMIT_STACK_SIZE)
to increase the per-thread stack size from its tiny default value (1 KiB?).
Even though a cuCtxGetLimit(CU_LIMIT_STACK_SIZE)
does acknowledge the value set, if this is set "too high", inscrutable errors (CUDA_ERROR_ILLEGAL_ADDRESS
) may result from later cuModuleLoadData
(?!) or cuLaunchKernel
calls.
It is unclear how to safely maximize the per-thread stack size.
Metadata
Assignees
Labels
No labels