Skip to content

Move FLS init before EventPipeAdapter::FinishInitialize and the first SetupThread() #115546

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
May 21, 2025

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented May 14, 2025

Fixes: #114506

In the previous fix we tried to call FinalizerThread::WaitForFinalizerThreadStart(); as late as possible in EEStartupHelper() - to make sure finalizer has enough time to do its part in initializing COM and FLS slot.

But if calls that happen before FinalizerThread::WaitForFinalizerThreadStart(); can result in Thread objects created and registered for OS destruction, then those threads will see g_flsIndex != FLS_OUT_OF_INDEXES, that includes the call to SetupThread(), which EEStartupHelper() itself does.

The change moves FinalizerThread::WaitForFinalizerThreadStart(); earlier, before any Threads could be created and registered for OS destruction.
We still have GC heap initialization happening before that, which will typically result in finalizer thread completing its part before the EE init thread needs to check for the completion.

Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@VSadov
Copy link
Member Author

VSadov commented May 16, 2025

If I deliberately delay finalizer thread by adding ClrSleepEx(1000, false); at the beginning of FinalizerThreadStart, I can see the original failure quite often.

If I run \tracing\tracing\tracing.cmd in a loop 3 times, I see 16 stack traces in the output with g_flsIndex != FLS_OUT_OF_INDEXES. I am not sure if they may be double-reported in the output, that is still a fairly consistent repro.


With this PR and with the deliberate delay like above, if I run \tracing\tracing\tracing.cmd in a loop 30 times, I see no failures.

@VSadov VSadov marked this pull request as ready for review May 16, 2025 01:37
@Copilot Copilot AI review requested due to automatic review settings May 16, 2025 01:37
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reorders initialization routines in the EEStartupHelper to ensure that the FLS slot is set up by the finalizer thread before any threads are created for OS destruction. Key changes include:

  • Removal of the SyncBlockCache::Attach() function.
  • Reordering of profiling services initialization, exception handling, and thread setup.
  • Moving the call to FinalizerThread::WaitForFinalizerThreadStart() earlier in the startup sequence under a TARGET_WINDOWS guard.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/coreclr/vm/syncblk.cpp Removed the unused SyncBlockCache::Attach() implementation to streamline sync block cache initialization.
src/coreclr/vm/ceemain.cpp Reordered initialization calls: moved finalizer thread waiting earlier and adjusted profiling and thread setup.
Comments suppressed due to low confidence (3)

src/coreclr/vm/syncblk.cpp:591

  • [nitpick] Following the removal of the SyncBlockCache::Attach() implementation, consider cleaning up its declaration in the header files if it is no longer used anywhere else.
void SyncBlockCache::Attach()

src/coreclr/vm/ceemain.cpp:893

  • The call to FinalizerThread::WaitForFinalizerThreadStart() is now guarded by TARGET_WINDOWS. Please confirm that non-Windows builds do not require this waiting, as it was previously executed unconditionally.
FinalizerThread::WaitForFinalizerThreadStart();

src/coreclr/vm/ceemain.cpp:872

  • [nitpick] The profiling services initialization block has been moved after the exception filter setup; please confirm that this reordering does not affect profiling behavior or the overall startup sequencing.
hr = ProfilingAPIUtility::InitializeProfiling();

@VSadov VSadov requested a review from jkotas May 16, 2025 16:14
@VSadov
Copy link
Member Author

VSadov commented May 16, 2025

I think this is ready to be reviewed.

}

#ifdef TARGET_WINDOWS
// g_pGCHeap->Initialize() above could take nontrivial time, so by now the finalizer thread
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we end up creating the finalizer thread too soon on non-Windows?

For example, can the finalizer Thread be registered in thread store before ProfilingAPIUtility::InitializeProfiling and miss profiler notifications that are done as part of the registration?

Copy link
Member Author

@VSadov VSadov May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If InitializeProfiling does not need Thread object for the current thread, it can be moved earlier.

The comments inside InitializeProfiling say that there are asserts somewhere that finalizer thread has started. So I wonder - are there tests for profiler attach? (or maybe I misunderstood the comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments inside InitializeProfiling say that there are asserts somewhere that finalizer thread has started.

These comments refer to the old profiler attach implementation. I am deleting them in #115680

So I wonder - are there tests for profiler attach?

Yes, look for "Attach" under https://github.com/dotnet/runtime/tree/main/src/tests/profiler. (The tests won't hit every subtle race condition though.)

Copy link
Member Author

@VSadov VSadov May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved the profiler initializtion earlier. If it needs to attach a Thread, then it would hit an assert in OsAttachThread. There is an assert there now to catch these cases deterministically.

I was mostly wondering if we test scenarios beyond "profiler is not involved at all".

Copy link
Member Author

@VSadov VSadov May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With default test set there are no failures, so based on that we could say profiler initializtion does not create/attach Thread objects - directly or indirectly, then it is all good.

If the profiler tests you've pointed to run in default test set, it may be good enough. If we need to run some other leg - like outer loop, perhaps makes sense to do that. Maybe makes sense to do that regardless.

@@ -839,33 +839,13 @@ void EEStartupHelper()

#ifdef PROFILING_SUPPORTED
// Initialize the profiling services.
// THis must happen before the finalizer thread is stopped on its first wait.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think this describes the required invariant accurately.

My concern is about making sure that the profilers receive these ThreadCreated notification for the finalize or any other thread reliably. I think it means that ProfilingAPIUtility::InitializeProfiling is called before SetupThread is on any thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering whether it would make sense to start the profiler thread in a different place on Windows vs. non-Windows to make both cases about as optimal as possible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start it earlier on Windows since we know we will need to wait for it and it won't run away to do other things.

Start it later on Windows once there is no problem with the finalizer thread running away to do other things.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it means that ProfilingAPIUtility::InitializeProfiling is called before SetupThread is on any thread.

I think the place where I moved InitializeProfiling would guarantee that any thread will do its SetupThread later than InitializeProfiling.

The place where InitializeProfiling is now does not allow SetupThread, at least on Windows - an assert will catch that.

Copy link
Member Author

@VSadov VSadov May 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that InitializeProfiling itself would not try to set up a thread, directly or indirectly. It looks like it does not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we may need to introduce a wait like that on the finalizer thread to fix the debugger initialization problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the debugger needs that the current thread is initialized and attached. We can, indeed, miss ThreadCreated on non-windows.

Maybe wait on all platforms in the same way. Creating a thread is fast next to GC initializing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe wait on all platforms in the same way

I do not think we want to penalize other platforms with the extra context switches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think next to what GC does (reserves tons of VM, creates a few threads,..), it may be ok.
But maybe one day GC will do all that in pay-for-play fashion. Fundamentally, it does not have to do it upfront.

The alternative is to start finalizer thread later on non-windows - as you've suggested. After the debugger is fully initialized.

Copy link
Member Author

@VSadov VSadov May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we may need to introduce a wait like that on the finalizer thread to fix the debugger initialization problem.

Yes, to be sure that finalizer thread on Windows waits for debugger init we need to make it wait for EE thread to tell it when it can proceed to HasStarted()

I've added that in the last change. Also in non-Windows case simply moved FinalizerThreadCreate() to a place after g_pDebugInterface->StartupPhase2

// Notify debugger once the first thread is created to finish initialization.
if (g_pDebugInterface != NULL)
{
g_pDebugInterface->StartupPhase2(GetThread());
Copy link
Member

@jkotas jkotas May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HasStarted on the finalizer thread will send ThreadCreated notifications to the debugger. On non-Windows, are there going to be issues if this debugger notification on the finalizer thread is sent to debugger before this StartupPhase2 finishes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetupThread also calls ThreadCreated and the EE initializing thread calls ThreadCreated before StartupPhase2 is called.

Both SetupThread and HasStarted will check for if (CORDebuggerAttached()) before sending the notification though.

The functional part of ThreadCreated (apart from notifications) is to check if debugger wants threads to stop, which cannot happen if debugger is not yet officially attached.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there could be a race where finalizer thread checks if debugger is attached (sees that it is not) and then debugger asks threads to stop.

There is no such race for the thread that initializes the EE.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@VSadov
Copy link
Member Author

VSadov commented May 21, 2025

@jkotas Re: backporting. I think we have not seen evidence that the bug is harmful in release builds.

Possibly it just leads to leaks of Thread objects for the event pipe thread or for the thread that initializes EE (if it quits before the rest of the app). There is a possibility of effects that are worse, but not yet seen or attributed to this issue.

It feels like there is no urgency in backporting and there is some risk that the fix may not be complete or brings up some other issue. I would suggest to at least wait couple weeks before considering backporting.

@jkotas
Copy link
Member

jkotas commented May 21, 2025

Ok, sounds good.

@VSadov
Copy link
Member Author

VSadov commented May 21, 2025

Thanks!

@VSadov VSadov merged commit 1998b26 into dotnet:main May 21, 2025
96 checks passed
@VSadov VSadov deleted the fix114506 branch May 21, 2025 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion failure g_flsIndex != FLS_OUT_OF_INDEXES
2 participants