Fix #2845 #2790 perf regression with vsync#2862
Conversation
There was a problem hiding this comment.
Code Review
This pull request removes legacy NVIDIA profile settings and implements a manual vsync timing mechanism in the rendering loop to reduce guest stalls. Feedback suggests calculating the refresh interval dynamically within the main loop to support multi-monitor setups and improving the precision of the interval calculation by using SDL3's rational refresh rate fields. Additionally, the timing function should be updated to include '(void)' in its signature to align with the project's style guide.
5015283 to
efa478f
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a manual frame timing mechanism in gl_render_frame to optimize VSync performance and removes unused NVIDIA-specific profile settings. Feedback indicates that the refresh interval calculation should be moved inside the main loop to correctly handle window movement between monitors with different refresh rates, preventing the frame rate from becoming stale.
efa478f to
dc14d50
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request removes the OGL_CPL_PREFER_DXPRESENT setting from the NVIDIA profile and introduces a manual vsync timing mechanism in the UI rendering loop to reduce guest stalls. The implementation includes a precise delay before swapping windows based on the host's refresh rate. Feedback suggests that the empty event handler for display and window changes should be updated to recalculate the vsync interval when the display environment changes.
|
just tried the latest action of your PR. appears to work perfectly on my laptop :) |
dc14d50 to
75761e3
Compare
This reverts commit da38b26. Further investigation shows that fixing the 2790 general issue also resolves in combination with HDR and fullscreen mode, so it is undesirable to force it off unnecessarily.
f63b073 to
3961b1a
Compare
In xemu-project#2691, xemu was updated to decouple the guest vblank interrupt from the host refresh. This allows the UI thread to run at host FPS while the guest runs at 60 fps, matching HW behavior. On some machines this led to a major decrease in guest performance when vsync is enabled in Windows. This affects AMD GPUs in particular, but also affects some subset of NVIDIA GPUs, with certain NVIDIA driver settings exacerbating the problem. At least one vector for the slowdown appears to be coarse locking within the AMD driver during SDL_GL_SwapWindow calls. Certain functions, like the glReadPixels backing the glo_readpixels call, appear to require an exclusive lock that is held within the driver while awaiting vblank. This leads to a stall in the guest emulation thread if one of these calls occurs while the main thread is awaiting vblank, with the performance lost being proportional to the vblank interval (lower refresh rates = larger interval and worse guest perf). This change simply performs the bulk of the delay directly within the UI thread rather than allowing SDL/graphics drivers to do it. This reduces the time that the driver holds the (assumed) lock and allows the guest GPU commands to continue to execute. Fixes xemu-project#2790 Fixes xemu-project#2845
5242d8c to
828854a
Compare
828854a to
7dfae71
Compare
ui: Sleep UI thread to minimize time in SDL_GL_SwapWindow
In #2691, xemu was updated to decouple the guest vblank interrupt from the host refresh. This allows the UI thread to run at host FPS while the guest runs at 60 fps, matching HW behavior.
On some machines this led to a major decrease in guest performance when vsync is enabled on some Windows machines. This affects AMD GPUs as well as NVIDIA GPUs, with certain NVIDIA driver settings exacerbating the problem.
At the moment it is unclear precisely why this slowdown occurs, but through trial and error it was found that allowing SDL and/or the graphics driver to handle the delay needed to align with vertical blanking is the trigger for the reduced performance. It was also found that lower host display refresh rates correlate with worse performance, which makes sense given that the xemu UI draws are trivial so the a lower refresh rate means the swap needs to wait longer to stay in sync.
This change simply performs the bulk of the delay directly within the UI thread rather than allowing SDL/graphics drivers to do it. This appears to resolve the issue on affected systems.
https://github.com/abaire/xemu-perf-tests/blob/4115d262c5e80d6b71234f8df65fa78c8ef15e12/src/tests/surface_rendering_tests.cpp#L167 reproduces the issue seen with Conker L&R (slow glo_readpixels) but it is somewhat subtle at the given tuning. Modifying this test to perform more draws per frame will likely produce a more obvious result w/ the pre-PR builds as there are more opportunities for glReadPixels to stall on main thread vblank waits.
Fixes #2790
Fixes #2845