Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Comments Needed] Core: Pause CPU and other systems only if initialized fully #8335

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

CookiePLMonster
Copy link
Contributor

@CookiePLMonster CookiePLMonster commented Aug 26, 2019

NOTE: I cannot fully guarantee this PR is free of side effects. Basing on my best intent and the current knowledge of Dolphin's codebase I expect it to work as intended and without unintended side effects, but I would really like other people to review and comment on the change.

Fixes a race condition where invoking RunAsCPUThread before Emulation thread has fully started, resulting in Dolphin attempting to pause yet uninitialized systems.

Cause of the issue

This issue occurs when RunAsCPUThread runs inside one of config changed callbacks processed when Dolphin emits OnEmulationStateChanged to Qt. There is a narrow time window (between s_is_booting.Set(); and s_hardware_initialized.Set();) where calls scheduled to run on CPU thread will try to pause CPU and systems before they are fully initialized. My change makes the constraint stricter, and makes PauseAndLock attempt to pause anything only once the CPU has fully booted, as opposed to current behaviour, where same logic is executed when the CPU has fully booted OR is booting.

Reproducing the issue

This race is relatively rare, so my repro steps are somewhat convoluted and involve adding an extra sleep to the code to make it easier to reproduce:

Prerequisites

a. "Slow down" emulation start routines by adding a sleep (100ms is enough) at the very top of HW::Init(). Reproducing the issue without this step is also possible, but harder - with an added sleep it's possible to achieve a 100% repro rate.
b. Have a game ready with game specific config which differs from the global one - in my test case, I have a game ready with Enable Emulated CPU Clock Override option enabled.

Repro steps

  1. Launch Dolphin.
  2. Open Config, navigate to Advanced. Ensure Enable Emulated CPU Clock Override is unchecked, then close the dialog.
  3. Open Graphics and then close the dialog.
  4. Launch the game.
  5. Observe results.

Observed results

Game crashes at either DSP::GetDSPEmulator() = nullptr or inside ExpansionInterface::PauseAndLock.

Expected results

Game launches normally.

@leoetlino leoetlino added the RFC Request for comments label Nov 9, 2019
@CookiePLMonster
Copy link
Contributor Author

Rebased on latest master to resolve conflicts.

Copy link
Member

@JosJuice JosJuice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to wait until the boot is done and then pause, rather than letting the calling code continue immediately like this PR currently does? While we don't need any protection against racing with the CPU thread during the boot process, we probably need protection against racing with EmuThread (depending on what the calling code actually does).

@@ -668,7 +674,7 @@ State GetState()

void WaitUntilDoneBooting()
{
if (s_is_booting.IsSet() || !s_hardware_initialized)
if (s_is_booting.IsSet() || !s_hardware_initialized.IsSet())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make this a little bit neater by calling IsBooted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are s_is_booting and !s_is_stopping functionally equivalent? I'll need to analyze it closely but I did not touch this on an assumption they are not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding it correctly, there will be a difference in behavior for the case where this is called while we are stopping. But the only code that calls this doesn't run when we are stopping or already have stopped, so this difference shouldn't affect anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I'll try to eyeball this later.

@CookiePLMonster
Copy link
Contributor Author

Wouldn't it be better to wait until the boot is done and then pause, rather than letting the calling code continue immediately like this PR currently does?

It sounds reasonable, but is pausing before boot is done even meaningful? I guess I can try that out but I cannot predict the side effects of that.

@JosJuice
Copy link
Member

It sounds reasonable, but is pausing before boot is done even meaningful? I guess I can try that out but I cannot predict the side effects of that.

I'm not suggesting to pause (i.e. set the emulation state to paused) before the boot is done, I'm suggesting to do it immediately after the boot is done.

@CookiePLMonster
Copy link
Contributor Author

When implemented like this, it currently deadlocks in Fifo::PauseAndLock - I guess there is something else pausing might need to wait before this PR can be modified.

@CookiePLMonster
Copy link
Contributor Author

Rebased on latest master.

Those flags are written to and read from concurrently,
and reading/writing to the same non-atomic variable is UB
Fixes a race condition where invoking RunAsCPUThread before Emulation thread has fully started,
resulting in Dolphin attempting to pause yet uninitialized systems
@CookiePLMonster
Copy link
Contributor Author

Rebased on latest master, with the necessary fixes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC Request for comments
Development

Successfully merging this pull request may close these issues.

4 participants