Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove parallel call Fixes #7946 #8635

Merged
merged 1 commit into from
Feb 6, 2025

Conversation

Forgind
Copy link
Member

@Forgind Forgind commented Jan 10, 2025

Fixes #7946

dsplaisted found a stack here that was very helpful in helping me find the source of the parallel call to the workload resolver. As seen in the top stack, this call fairly quickly reaches the WorkloadResolver, where it attempts to initialize everything concurrently, causing the issue. It's called from TemplatePackageCoordinator in the SDK repo, which in turn is called by various things.

I do not think the Parallel.For call in TemplatePackageManager (see #7946 (comment)) is an issue in this case. As I understand it, ScanAsync just looks at a precomputed set of ITemplatePackages, which means it doesn't have to hit the resolver to find them.

@dsplaisted @nagilson

@Forgind Forgind requested a review from a team as a code owner January 10, 2025 00:49
@@ -24,7 +24,7 @@ public TemplateConstraintManager(IEngineEnvironmentSettings engineEnvironmentSet
_logger.LogDebug($"Found {constraintFactories.Count()} constraints factories, initializing.");
foreach (var constraintFactory in constraintFactories)
{
_templateConstrains[constraintFactory.Type] = Task.Run(() => constraintFactory.CreateTemplateConstraintAsync(engineEnvironmentSettings, _cancellationTokenSource.Token));
_templateConstrains[constraintFactory.Type] = constraintFactory.CreateTemplateConstraintAsync(engineEnvironmentSettings, _cancellationTokenSource.Token);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been staring at this for a few minutes. My thoughts are... it is still running this asynchronously, since there is no await. The difference is that without Task.Run, it isn't part of a thread pool; it runs each asynchronously on the same thread.

I don't know the exact issue here, but I did notice one thing. The original Task.Run call forgot to pass the cancellation toaken. Meaning, it could have been:

Task.Run(() => constraintFactory.CreateTemplateConstraintAsync(engineEnvironmentSettings, _cancellationTokenSource.Token), _cancellationTokenSource.Token);

Copy link
Member

@nagilson nagilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating this change. Sorry for my delay, this was fairly complicated. I think I should have communicated the issue to you better -- you're right and it's a good observation that, that call stack doesn't call the workload constraints. ( I had to try to learn what a 'mount point' was...)

I think the simplest way to make this change (can be done with a few lines) would be to use SemaphoreSlim.WaitAsync. See how to use it here, and put it around this call to the workload resolver:

IEnumerable<WorkloadInfo> currentProviderWorkloads = await providers[0].GetInstalledWorkloadsAsync(token).ConfigureAwait(false);

The semaphore slim is a way to have a lock object that you can run await code inside of the lock. The standard static lock in C# is not friendly with await code, which is why this semaphore slim was designed. You grab the exclusive mutex for this exact purpose. (You may already be familiar with this, but I thought it may be helpful to provide context regardless)

https://stackoverflow.com/questions/7612602/why-cant-i-use-the-await-operator-within-the-body-of-a-lock-statement This article may also be helpful.

I think this approach you're taking may be valid, but the above approach should take about 5 lines or so (Create Mutex, Acquire, Release, {, }), so it would be an easier to read solution with less churn risk.

@Forgind Forgind force-pushed the remove-parallel-call branch from 03c1024 to 6ca00a9 Compare February 5, 2025 03:38
Copy link
Member

@dsplaisted dsplaisted left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Were you able to test this manually at all? It would be good to repro the issue and then verify that it doesn't repro anymore with the fix.

That may be kind of difficult, but in the past I think there have been times when we could repro this regularly.

@Forgind
Copy link
Member Author

Forgind commented Feb 6, 2025

Looks good. Were you able to test this manually at all? It would be good to repro the issue and then verify that it doesn't repro anymore with the fix.

That may be kind of difficult, but in the past I think there have been times when we could repro this regularly.

I have never been able to reproduce this problem, personally, which is part of why I spent so long trying to read through code and stack traces for this...I agree that would make me feel better about it, but I don't think this will make the situation worse, at minimum. If we see a very similar-looking issue crop up again in a few months, we can revisit this to see if we can get a better fix.

@Forgind Forgind merged commit 8c5b547 into dotnet:release/9.0.2xx Feb 6, 2025
10 checks passed
@Forgind Forgind deleted the remove-parallel-call branch February 6, 2025 00:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants