Current kernel launch API is not CUDA Graph capture safe; need a ‘no hidden streams, no hidden sync’ mode
Current kernel launch API is not CUDA Graph capture safe; need a ‘no hidden streams, no hidden sync’ mode