It appears that, by design, the current stream is intended to be thread-local, consistent with PyTorch's design:
|
stream.def("__enter__", &Stream::activate, "Activate the CUDA stream as the current stream for this thread.") |
However, the implementation uses a global singleton pattern:
|
StreamStack::Instance().push(*this); |
|
static StreamStack stack; |
This will cause the current stream to return an incorrect stream in multi-threaded scenarios.
It appears that, by design, the current stream is intended to be thread-local, consistent with PyTorch's design:
CV-CUDA/python/mod_cvcuda/nvcv/Stream.cpp
Line 524 in d45fa0a
However, the implementation uses a global singleton pattern:
CV-CUDA/python/mod_cvcuda/nvcv/Stream.cpp
Line 352 in d45fa0a
CV-CUDA/python/mod_cvcuda/nvcv/StreamStack.cpp
Line 51 in d45fa0a
This will cause the
current streamto return an incorrect stream in multi-threaded scenarios.