Use FLS detach callback as a thread termination notification. Another try. #112809

VSadov · 2025-02-22T01:55:33Z

Switches the mechanism for OS notification about thread termination to use FLS (Fiber Local Storage) - the same mechanism as used on NativeAOT.

The advantage of FLS notification is that it does not run while holding the Loader Lock, thus it does not require that the termination callback never calls or waits for an OS call that may need to acquire that same Loader Lock. (It is hard for us to guarantee the latter)

This PR is similar to #110589 with additional mitigation for scenario where COM thread cleanups reinitialize the Thread object after we had our last chance to clean it up.
The mitigation is basically ensuring that COM is initialized and sets up its FLS slot before we set up ours.

Copilot

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

VSadov · 2025-02-22T02:08:59Z

@jkotas @mangod9 This is basically the same PR as before + mitigation for the COM reinitializing/attaching Thread objects or even running managed code after we had our last chance to detach Threads.

The idea is to call CoInitialize before setting up our FLS slot, which would make COM to set up its FLS slot, if not already initialized.

We call CoInitialize in most cases anyways, and fairly early, for example once the first method starts JIT-ing we launch the tiering helper, which is explicitly an MTA thread. Most user-launched threads are MTA as well. Alas, all that would be after we have our FLS slot set.

Why on the finalizer thread and not on the thread that initializes EE?
My concern is that we do not know the context of the thread that calls into EE initialization. In theory CoInitialize may be not allowed on some threads and initialize nothing. We control the state of the Finalizer thread though.

VSadov · 2025-02-22T02:15:30Z

I have run quite a bit of winform tests with this + there is now a failfast in case we see an orphaned Thread in DLL_THREAD_DETACH, which is more deterministic than crashing on GC holes.

jkotas · 2025-02-22T12:40:10Z

System.Threading.Tests.EtwTests.WaitHandleWaitEventTest failure looks related to this change.

src/coreclr/vm/ceemain.cpp

jkotas · 2025-02-22T12:51:37Z

src/coreclr/vm/interoputil.cpp

+        if (!IsFinalizerThread())
+        {
+            FinalizerThread::GetFinalizerThread()->SetRequiresCoInitialize();
+            // set the finalizer event
+            FinalizerThread::EnableFinalization();
+        }


What is this trying to do? Is it still needed now that we wait for the finalizer thread to CoInitialize COM during startup?

I think all the stuff that tries to CoInitialize the current and Finalizer threads lazily can be removed if this change goes in.
I think quite a bit of cleanup might be possible even without this change.
Compared to NativeAOT there is a lot of extra complexity in Thread init/deinit. Some of that may still be needed (i.e. debugger/COM/syncblocks/threadstatics related), but some of that seems to be leftovers since we had hosting/fibers/appdomains and the like.

I did not want to go too far beyond what is needed for the fix - jut to minimize the diff, in case we backport this.

I agree that there is a lot of cruft that can be cleaned up that is better left for a different PR.

I am not able to reason about whether this if (!IsFinalizerThread()) { ... } block is just a useless code or whether it can be actively harmful in some corner cases.

For this PR, can we delete the if (!IsFinalizerThread()) block, fCoInitCurrentThread argument for EnsureComStarted and the one call to EnsureComStarted(FALSE)?

I have added g_pFinalizerThread->SetApartmentOfUnstartedThread(Thread::AS_InMTA) to match manual call into CoInitialize, but that results in "lazy coinitialize" pattern forcing finalizer to set itself to MTA in response to setting finalizer thread to MTA.

I think I can remove both this change and g_pFinalizerThread->SetApartmentOfUnstartedThread(Thread::AS_InMTA) to raise fewer questions.
It is ok to use native API to set apartment state before Thread is attached. The rest is unnecessary and perhaps confusing.

I am thinking about performance overhead of this change. We are going to do extra thread switch during startup critical path. It is unavoidable.

However, EnsureComStarted that is going to be executed a bit later is going to spin the finalizer thread again by setting FinalizerThread::EnableFinalization.

Can we reduce the startup overhead by setting g_fComStarted on finalizer thread once we call CoInitialize and reduce EnsureComStarted to just:

_ASSERTE(g_fComStarted); // COM is expected to be started on finalizer thread during startup

?

I am thinking about performance overhead of this change. We are going to do extra thread switch during startup critical path. It is unavoidable.

I've moved launching of finalizer thread a bit earlier. EE Initializing thread will do some other work before it needs to check if FLS/COM initialization is finished. Hopefully it will be at least partially done, so we'd not need to wait much, if at all.

Can we reduce the startup overhead by setting g_fComStarted on finalizer thread

Good point. Since we will be initializing anyways, why not get some benefit from it.

VSadov · 2025-02-22T19:26:03Z

System.Threading.Tests.EtwTests.WaitHandleWaitEventTest failure looks related to this change.

That is on alpine-3.21-helix-arm32v7. I did not realize we run ceemain/DLL_THREAD_DETACH on linux as well, nor I am certain if the same invariant applies in such case.

It may be worth to investigate and add similar guards on other platforms (unix, NativeAOT), if possible and appropriate, but it should not be in this PR.

I will scope the failfast to Windows only.

src/coreclr/vm/ceemain.cpp

jkotas · 2025-02-23T07:36:56Z

src/coreclr/vm/ceemain.cpp

+        _ASSERTE_ALL_BUILDS(!"Detaching a thread from the wrong fiber");
+    }
+
+    flsState = FLS_STATE_CLEAR;


Is this right? I do not think that we can handle re-initialization of the Thread after this point gracefully.

This happens in winforms. We currently rely on re-initialization in COM cleanup scenarios -

A thread returns from its entry point and we call DestroyThread, which comes here and detaches/destroys the Thread.

Then OS goes through FLS callbacks and COM callback may need to run something that needs Thread, so we attach a new instance of Thread (because the former is gone by then), and the new Thread gets later deinitialized in our FLS callback.
As long as our deinitialization happens last, things work ok. Seems like it has been like this for a long time.

Ideally, we should not call DestroyThread explicitly when thread exits, just rely on callback - so we would never need to reinitialize. Naively, making such change breaks other things, so probably not in this PR.

Ok, I agree that we want to fix this one in separate PR. The reinitialization sounds like a great source of mysterious corner-case bugs.

src/coreclr/vm/ceemain.cpp

VSadov · 2025-02-25T03:37:56Z

I think the new failures in SslStream_ServerUntrustedCaWithCustomTrust_OK are unrelated.
(The remote certificate is invalid because of errors in the certificate chain: PartialChain)

src/coreclr/vm/ceemain.cpp

jkotas

LGTM otherwise. Thank you!

jkotas · 2025-02-25T04:49:28Z

src/coreclr/vm/ceemain.cpp

+#define FLS_STATE_ARMED 1
+#define FLS_STATE_INVOKED 2
+
+static __declspec(thread) byte flsState;


Suggested change

static __declspec(thread) byte flsState;

static __declspec(thread) byte t_flsState;

Naming nit

Co-authored-by: Jan Kotas <[email protected]>

VSadov · 2025-02-25T23:04:52Z

/ba-g remaining 2 osx arm64 test legs just take very long time and seem to eventually time out in most runs.

u-less · 2025-02-27T08:41:47Z

@VSadov Does this fix work for Linux platform ? We also encountered the same problem on the Linux platform after upgrading to. net9.

jkotas · 2025-02-27T12:34:59Z

This is fixing Windows-specific problem. If you are seeing issues on Linux, it is going to be something else. You may want to open a new issue about it.

VSadov · 2025-03-02T16:33:59Z

/backport to release/9.0-staging

github-actions · 2025-03-02T16:34:09Z

Started backporting to release/9.0-staging: https://github.com/dotnet/runtime/actions/runs/13617313559

github-actions · 2025-03-02T16:35:50Z

@VSadov backporting to "release/9.0-staging" failed, the patch most likely resulted in conflicts:

$ git am --3way --empty=keep --ignore-whitespace --keep-non-patch changes.patch

Applying: Use FLS detach as thread termination notification on windows.
Using index info to reconstruct a base tree...
M	src/coreclr/nativeaot/Runtime/threadstore.cpp
M	src/coreclr/vm/ceemain.cpp
M	src/coreclr/vm/threads.cpp
Falling back to patching base and 3-way merge...
Auto-merging src/coreclr/vm/threads.cpp
Auto-merging src/coreclr/vm/ceemain.cpp
Auto-merging src/coreclr/nativeaot/Runtime/threadstore.cpp
Applying: use _ASSERTE_ALL_BUILDS
Applying: one more case
Applying: InitFlsSlot throws per convention used in threading initialization
Applying: OsDetachThread could be void
Applying: Update src/coreclr/vm/ceemain.cpp
Applying: ensure CoInitialize
error: sha1 information is lacking or useless (src/coreclr/vm/ceemain.cpp).
error: could not build fake ancestor
hint: Use 'git am --show-current-patch=diff' to see the failed patch
hint: When you have resolved this problem, run "git am --continue".
hint: If you prefer to skip this patch, run "git am --skip" instead.
hint: To restore the original branch and stop patching, run "git am --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Patch failed at 0007 ensure CoInitialize
Error: The process '/usr/bin/git' failed with exit code 128

Please backport manually!

… try. (dotnet#112809) * Use FLS detach as thread termination notification on windows. * use _ASSERTE_ALL_BUILDS * one more case * InitFlsSlot throws per convention used in threading initialization * OsDetachThread could be void * Update src/coreclr/vm/ceemain.cpp * ensure CoInitialize * Asserts to fail deterministically. * comments * scope the failfast to Windows and tweak the comments a bit. * better assert * Apply suggestions from code review Co-authored-by: Jan Kotas <[email protected]> * Undo unnecessary finalizer thread changes * reverse the failfast condition * handle destruction of unattached threads * PR feedback Co-authored-by: Jan Kotas <[email protected]> * Update src/coreclr/vm/ceemain.cpp * set g_fComStarted as soon as we call CoInitialize * Naming nit * fix a typpo + better comment --------- Co-authored-by: Jan Kotas <[email protected]>

… try. (#112809) (#113055) * Use FLS detach as thread termination notification on windows. * use _ASSERTE_ALL_BUILDS * one more case * InitFlsSlot throws per convention used in threading initialization * OsDetachThread could be void * Update src/coreclr/vm/ceemain.cpp * ensure CoInitialize * Asserts to fail deterministically. * comments * scope the failfast to Windows and tweak the comments a bit. * better assert * Apply suggestions from code review * Undo unnecessary finalizer thread changes * reverse the failfast condition * handle destruction of unattached threads * PR feedback * Update src/coreclr/vm/ceemain.cpp * set g_fComStarted as soon as we call CoInitialize * Naming nit * fix a typpo + better comment --------- Co-authored-by: Jan Kotas <[email protected]>

Copilot AI review requested due to automatic review settings February 22, 2025 01:55

VSadov requested a review from MichalStrehovsky as a code owner February 22, 2025 01:55

ghost added the area-VM-coreclr label Feb 22, 2025

Copilot AI reviewed Feb 22, 2025

View reviewed changes

VSadov added the os-windows label Feb 22, 2025

dotnet-policy-service bot assigned VSadov Feb 22, 2025

This was referenced Feb 22, 2025

slow macOS - "##[error]The job running on agent Azure Pipelines 9 ran longer than the maximum time of 60 minutes." dotnet/dnceng#1883

Open

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

jkotas reviewed Feb 22, 2025

View reviewed changes

src/coreclr/vm/ceemain.cpp Outdated Show resolved Hide resolved

jkotas reviewed Feb 22, 2025

View reviewed changes

jkotas requested a review from kouvel February 22, 2025 12:52

build-analysis bot mentioned this pull request Feb 23, 2025

[9.0] Test assert failure in X509Certificates.Tests.RevocationTests.AiaTests.AiaAcceptsCertTypesAndIgnoresNonCertTypes #107364

Open

jkotas reviewed Feb 23, 2025

View reviewed changes

VSadov force-pushed the FLS2 branch from eb7e309 to e99abf4 Compare February 23, 2025 23:12

VSadov commented Feb 25, 2025

View reviewed changes

src/coreclr/vm/ceemain.cpp Outdated Show resolved Hide resolved

jkotas reviewed Feb 25, 2025

View reviewed changes

jkotas approved these changes Feb 25, 2025

View reviewed changes

VSadov and others added 8 commits February 25, 2025 12:21

Use FLS detach as thread termination notification on windows.

cbda245

use _ASSERTE_ALL_BUILDS

7a846ec

one more case

f4cac2d

InitFlsSlot throws per convention used in threading initialization

c2b13dd

OsDetachThread could be void

90559bb

Update src/coreclr/vm/ceemain.cpp

d62a1fc

ensure CoInitialize

98639a1

Asserts to fail deterministically.

092e6f6

VSadov and others added 12 commits February 25, 2025 12:21

comments

d952ae4

scope the failfast to Windows and tweak the comments a bit.

d29fb5b

better assert

322dadc

Apply suggestions from code review

2593989

Co-authored-by: Jan Kotas <[email protected]>

Undo unnecessary finalizer thread changes

1612a8e

reverse the failfast condition

2d6c736

handle destruction of unattached threads

a1f374a

PR feedback

2aa70a5

Co-authored-by: Jan Kotas <[email protected]>

Update src/coreclr/vm/ceemain.cpp

82f3b55

set g_fComStarted as soon as we call CoInitialize

9bfcada

Naming nit

65eb637

fix a typpo + better comment

45fcc47

VSadov force-pushed the FLS2 branch from b7cb5d1 to 45fcc47 Compare February 25, 2025 20:21

VSadov merged commit fa3beb9 into dotnet:main Feb 25, 2025
91 of 94 checks passed

VSadov deleted the FLS2 branch February 25, 2025 23:05

jkotas mentioned this pull request Mar 2, 2025

[Native AOT] Initialize COM eagerly #113040

Closed

VSadov mentioned this pull request Mar 3, 2025

[release/9.0-staging] Backport "Use FLS detach callback as a thread termination notification. Another try." #113055

Merged

4 tasks

github-actions bot locked and limited conversation to collaborators Apr 2, 2025

	static __declspec(thread) byte flsState;
	static __declspec(thread) byte t_flsState;

Use FLS detach callback as a thread termination notification. Another try. #112809

Use FLS detach callback as a thread termination notification. Another try. #112809

Uh oh!

Conversation

VSadov commented Feb 22, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Uh oh!

VSadov commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VSadov commented Feb 22, 2025

Uh oh!

jkotas commented Feb 22, 2025

Uh oh!

Uh oh!

jkotas Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

VSadov Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Feb 22, 2025

Choose a reason for hiding this comment

Uh oh!

VSadov Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

jkotas Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

VSadov Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

VSadov commented Feb 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jkotas Feb 23, 2025

Choose a reason for hiding this comment

Uh oh!

VSadov Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jkotas Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VSadov commented Feb 25, 2025

Uh oh!

Uh oh!

jkotas left a comment

Choose a reason for hiding this comment

Uh oh!

jkotas Feb 25, 2025

Choose a reason for hiding this comment

Uh oh!

VSadov commented Feb 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

u-less commented Feb 27, 2025

Uh oh!

jkotas commented Feb 27, 2025

Uh oh!

VSadov commented Mar 2, 2025

Uh oh!

github-actions bot commented Mar 2, 2025

Uh oh!

github-actions bot commented Mar 2, 2025

Uh oh!

Uh oh!

VSadov commented Feb 22, 2025 •

edited

Loading

VSadov Feb 22, 2025 •

edited

Loading

VSadov Feb 25, 2025 •

edited

Loading

VSadov commented Feb 22, 2025 •

edited

Loading

VSadov Feb 23, 2025 •

edited

Loading

VSadov commented Feb 25, 2025 •

edited

Loading