I've been looking into Issue #113, and I was actually just about to ask a very similar question myself. It feels like a lot of operations in NVRHI that should logically be tied to a CommandList are instead bound to the Device global state. This feels quite strange from a modern API perspective.
For instance, things like semaphores and event queries could easily be associated with the command list itself, making the whole process more stateless. Even if we accept that the final submission or present calls aren't inherently thread-safe (meaning you shouldn't submit from multiple threads simultaneously), reducing this device-level state coupling would still be a huge win. It would allow us to "pre-configure" these states within the command list recording phase, which in turn would minimize the critical time where the device must be locked or serialized.
The biggest issue for me is that the current approach is just hard to grasp intuitively. It doesn't really follow the patterns you see in modern graphics APIs like Vulkan or D3D12. When I first started using EventQuery, I honestly had no idea what it was supposed to represent because I naturally expected a synchronization primitive to be returned by or bound to a specific operation or command list. The current implementation feels a bit like the old OpenGL global state machine style of submission, which is a massive departure from the more explicit, stateless feel of the rest of the library. It makes the synchronization logic feel fragmented and prone to errors that are hard to debug.
I’m really curious to know if this design was intentional. Was there a specific architectural trade-off that made this global-state approach preferable? Perhaps it was meant to simplify things for developers coming from older APIs, or maybe there’s some backend-specific limitation I'm missing.
I also wonder why NVRHI doesn't just make the device-level submission and wait operations thread-safe internally. It seems like adding a lightweight synchronization layer inside the device wouldn't introduce much overhead, but it would certainly reduce the chance of accidental misuse and race conditions like the ones being discussed in this issue. It would be great to get some insight into the "why" behind this choice.
I've been looking into Issue #113, and I was actually just about to ask a very similar question myself. It feels like a lot of operations in NVRHI that should logically be tied to a CommandList are instead bound to the Device global state. This feels quite strange from a modern API perspective.
For instance, things like semaphores and event queries could easily be associated with the command list itself, making the whole process more stateless. Even if we accept that the final submission or present calls aren't inherently thread-safe (meaning you shouldn't submit from multiple threads simultaneously), reducing this device-level state coupling would still be a huge win. It would allow us to "pre-configure" these states within the command list recording phase, which in turn would minimize the critical time where the device must be locked or serialized.
The biggest issue for me is that the current approach is just hard to grasp intuitively. It doesn't really follow the patterns you see in modern graphics APIs like Vulkan or D3D12. When I first started using EventQuery, I honestly had no idea what it was supposed to represent because I naturally expected a synchronization primitive to be returned by or bound to a specific operation or command list. The current implementation feels a bit like the old OpenGL global state machine style of submission, which is a massive departure from the more explicit, stateless feel of the rest of the library. It makes the synchronization logic feel fragmented and prone to errors that are hard to debug.
I’m really curious to know if this design was intentional. Was there a specific architectural trade-off that made this global-state approach preferable? Perhaps it was meant to simplify things for developers coming from older APIs, or maybe there’s some backend-specific limitation I'm missing.
I also wonder why NVRHI doesn't just make the device-level submission and wait operations thread-safe internally. It seems like adding a lightweight synchronization layer inside the device wouldn't introduce much overhead, but it would certainly reduce the chance of accidental misuse and race conditions like the ones being discussed in this issue. It would be great to get some insight into the "why" behind this choice.