Skip to content

Fix #2845 #2790 perf regression with vsync#2862

Draft
abaire wants to merge 7 commits into
xemu-project:masterfrom
abaire:fix_2845_2790_perf_regression_with_vsync
Draft

Fix #2845 #2790 perf regression with vsync#2862
abaire wants to merge 7 commits into
xemu-project:masterfrom
abaire:fix_2845_2790_perf_regression_with_vsync

Conversation

@abaire

@abaire abaire commented May 10, 2026

Copy link
Copy Markdown
Member

ui: Sleep UI thread to minimize time in SDL_GL_SwapWindow

In #2691, xemu was updated to decouple the guest vblank interrupt from the host refresh. This allows the UI thread to run at host FPS while the guest runs at 60 fps, matching HW behavior.

On some machines this led to a major decrease in guest performance when vsync is enabled on some Windows machines. This affects AMD GPUs as well as NVIDIA GPUs, with certain NVIDIA driver settings exacerbating the problem.

At the moment it is unclear precisely why this slowdown occurs, but through trial and error it was found that allowing SDL and/or the graphics driver to handle the delay needed to align with vertical blanking is the trigger for the reduced performance. It was also found that lower host display refresh rates correlate with worse performance, which makes sense given that the xemu UI draws are trivial so the a lower refresh rate means the swap needs to wait longer to stay in sync.

This change simply performs the bulk of the delay directly within the UI thread rather than allowing SDL/graphics drivers to do it. This appears to resolve the issue on affected systems.

https://github.com/abaire/xemu-perf-tests/blob/4115d262c5e80d6b71234f8df65fa78c8ef15e12/src/tests/surface_rendering_tests.cpp#L167 reproduces the issue seen with Conker L&R (slow glo_readpixels) but it is somewhat subtle at the given tuning. Modifying this test to perform more draws per frame will likely produce a more obvious result w/ the pre-PR builds as there are more opportunities for glReadPixels to stall on main thread vblank waits.

Fixes #2790
Fixes #2845

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes legacy NVIDIA profile settings and implements a manual vsync timing mechanism in the rendering loop to reduce guest stalls. Feedback suggests calculating the refresh interval dynamically within the main loop to support multi-monitor setups and improving the precision of the interval calculation by using SDL3's rational refresh rate fields. Additionally, the timing function should be updated to include '(void)' in its signature to align with the project's style guide.

Comment thread ui/xemu.c Outdated
Comment thread ui/xemu.c Outdated
Comment thread ui/xemu.c Outdated
@abaire abaire changed the title Fix 2845 2790 perf regression with vsync Fix #2845 #2790 perf regression with vsync May 10, 2026
@abaire abaire force-pushed the fix_2845_2790_perf_regression_with_vsync branch from 5015283 to efa478f Compare May 10, 2026 22:24
@abaire

abaire commented May 10, 2026

Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a manual frame timing mechanism in gl_render_frame to optimize VSync performance and removes unused NVIDIA-specific profile settings. Feedback indicates that the refresh interval calculation should be moved inside the main loop to correctly handle window movement between monitors with different refresh rates, preventing the frame rate from becoming stale.

Comment thread ui/xemu.c Outdated
@abaire abaire force-pushed the fix_2845_2790_perf_regression_with_vsync branch from efa478f to dc14d50 Compare May 10, 2026 22:31
@MasonT8198

Copy link
Copy Markdown
Contributor

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request removes the OGL_CPL_PREFER_DXPRESENT setting from the NVIDIA profile and introduces a manual vsync timing mechanism in the UI rendering loop to reduce guest stalls. The implementation includes a precise delay before swapping windows based on the host's refresh rate. Feedback suggests that the empty event handler for display and window changes should be updated to recalculate the vsync interval when the display environment changes.

Comment thread ui/xemu.c
@re4thewin

Copy link
Copy Markdown

just tried the latest action of your PR. appears to work perfectly on my laptop :)

This reverts commit da38b26.

Further investigation shows that fixing the 2790 general issue also resolves
in combination with HDR and fullscreen mode, so it is undesirable to force
it off unnecessarily.
@abaire abaire force-pushed the fix_2845_2790_perf_regression_with_vsync branch 3 times, most recently from f63b073 to 3961b1a Compare May 27, 2026 04:25
abaire added 3 commits May 26, 2026 21:27
In xemu-project#2691, xemu was updated to decouple the guest vblank interrupt from the host
refresh. This allows the UI thread to run at host FPS while the guest runs at
60 fps, matching HW behavior.

On some machines this led to a major decrease in guest performance when vsync
is enabled in Windows. This affects AMD GPUs in particular, but also affects
some subset of NVIDIA GPUs, with certain NVIDIA driver settings exacerbating
the problem.

At least one vector for the slowdown appears to be coarse locking within the
AMD driver during SDL_GL_SwapWindow calls. Certain functions, like the
glReadPixels backing the glo_readpixels call, appear to require an exclusive
lock that is held within the driver while awaiting vblank. This leads to a
stall in the guest emulation thread if one of these calls occurs while the
main thread is awaiting vblank, with the performance lost being proportional
to the vblank interval (lower refresh rates = larger interval and worse guest
perf).

This change simply performs the bulk of the delay directly within the UI thread
rather than allowing SDL/graphics drivers to do it. This reduces the time that
the driver holds the (assumed) lock and allows the guest GPU commands to
continue to execute.

Fixes xemu-project#2790
Fixes xemu-project#2845
@abaire abaire force-pushed the fix_2845_2790_perf_regression_with_vsync branch 6 times, most recently from 5242d8c to 828854a Compare May 27, 2026 20:25
@abaire abaire force-pushed the fix_2845_2790_perf_regression_with_vsync branch from 828854a to 7dfae71 Compare May 27, 2026 20:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants