Detect and correct worker node over-provisioning#13220
Draft
Detect and correct worker node over-provisioning#13220
Conversation
Contributor
|
Hello @@copilot, I noticed that you’re changing an .swr file or any file under src/Package/MSBuild.VSSetup.. Please make sure to validate this change by an experimental VS insertion. This is accomplished by pushing to an exp/* branch, which requires write permissions to this repo. |
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add detection and correction for worker node over-provisioning
Detect and correct worker node over-provisioning
Feb 7, 2026
Member
### Context It is quite often that `Path.GetFullPath` is used in the tasks code. For enlightening we would want to replace it with our own API. The candidate was `TaskEnvironment.GetAbsolutePath` for replacing. However, they do not do the same set of actions. `TaskEnvironment.GetAbsolutePath` only absolutizes the path but does not resolve possible "." and ".." in the path or normalize path separators on Windows. For some scenarios this is critical, for some that does not matter. We should provide an option for fully copy `Path.GetFullPath` behavior ### Changes Made - Added `AbsolutePath.GetCanonicalForm()` ### Testing Added unit tests
### Context Thread-safe tasks must avoid global process state (working directory, environment variables). `WriteCodeFragment` uses relative paths resolved against the current working directory, making it unsafe for multithreaded execution. ### Changes Made - Add `[MSBuildMultiThreadableTask]` attribute and implement `IMultiThreadableTask` - Use `TaskEnvironment.GetAbsolutePath()` to resolve paths before file operations: ```csharp outputFilePath = TaskEnvironment.GetAbsolutePath(OutputFile.ItemSpec); ``` - Pass absolute paths to `FileUtilities.GetTemporaryFile()`, `FileUtilities.EnsureDirectoryExists()`, `File.WriteAllText()`, and `LockCheck.GetLockedFileMessage()` - Update all test instantiations to set `TaskEnvironment = TaskEnvironmentHelper.CreateForTest()` - Preserve behavior: when `OutputDirectory` is relative and `OutputFile` is not specified, the resulting `OutputFile` output remains relative (matching original behavior) ### Testing All 52 WriteCodeFragment tests pass, including: - A new test `RelativeOutputDirectoryProducesRelativeOutputFile` that verifies when `OutputDirectory` is relative and `OutputFile` is not passed, the resulting `OutputFile` is also relative - The `CommentIsInvariantCulture` tests (added in main) now have `TaskEnvironment` initialized ### Notes None. <!-- START COPILOT CODING AGENT SUFFIX --> <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > Migrate the WriteCodeFragment task to use TaskEnvironment API. > Make sure to preserve correctness, performance and that the tests pass </details> <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/dotnet/msbuild/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Jan Provazník <janprovaznik@microsoft.com>
Co-authored-by: jeffhandley <1031940+jeffhandley@users.noreply.github.com>
This pull request updates the following dependencies [marker]: <> (Begin:369b758c-dad8-4fea-810a-64fb6b0308e9) ## From https://github.com/dotnet/arcade - **Subscription**: [369b758c-dad8-4fea-810a-64fb6b0308e9](https://maestro.dot.net/subscriptions?search=369b758c-dad8-4fea-810a-64fb6b0308e9) - **Build**: [20260206.4](https://dev.azure.com/dnceng/internal/_build/results?buildId=2897563) ([300694](https://maestro.dot.net/channel/8394/github:dotnet:arcade/build/300694)) - **Date Produced**: February 6, 2026 10:35:32 PM UTC - **Commit**: [58d75363f4b03e408aa2502e0db546173be722d4](dotnet/arcade@58d7536) - **Branch**: [release/10.0](https://github.com/dotnet/arcade/tree/release/10.0) [DependencyUpdate]: <> (Begin) - **Dependency Updates**: - From [10.0.0-beta.26080.4 to 10.0.0-beta.26106.4][1] - Microsoft.DotNet.Arcade.Sdk - Microsoft.DotNet.XUnitExtensions [1]: dotnet/arcade@b0f891c...58d7536 [DependencyUpdate]: <> (End) [marker]: <> (End:369b758c-dad8-4fea-810a-64fb6b0308e9) Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
This pull request updates the following dependencies [marker]: <> (Begin:8a4332aa-2543-4c51-b941-e73f31e22328) ## From https://github.com/dotnet/roslyn - **Subscription**: [8a4332aa-2543-4c51-b941-e73f31e22328](https://maestro.dot.net/subscriptions?search=8a4332aa-2543-4c51-b941-e73f31e22328) - **Build**: [20260205.1](https://dev.azure.com/dnceng/internal/_build/results?buildId=2896280) ([300424](https://maestro.dot.net/channel/548/github:dotnet:roslyn/build/300424)) - **Date Produced**: February 5, 2026 12:45:29 PM UTC - **Commit**: [da86ec644af82dcb8575ff0c22ac6afdcd6b5066](dotnet/roslyn@da86ec6) - **Branch**: [main](https://github.com/dotnet/roslyn/tree/main) [DependencyUpdate]: <> (Begin) - **Dependency Updates**: - From [5.5.0-2.26081.2 to 5.5.0-2.26105.1][1] - Microsoft.Net.Compilers.Toolset [1]: dotnet/roslyn@265653e...da86ec6 [DependencyUpdate]: <> (End) [marker]: <> (End:8a4332aa-2543-4c51-b941-e73f31e22328) Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
When using MSBuild's multithreaded mode (-mt / /maxcpucount), the %(RecursiveDir) built-in metadata was lost when items crossed process boundaries to the TaskHost. This caused NuGet pack to flatten directory structures (e.g., build\wix\bundle\bundle.wxs became build\bundle.wxs). Root cause: TaskParameterTaskItem only copied custom metadata when wrapping ITaskItem for TaskHost serialization. RecursiveDir is a built-in metadata that cannot be derived from just the item spec - it requires the original wildcard pattern (_includeBeforeWildcardExpansionEscaped). Fix: Explicitly copy RecursiveDir to custom metadata in TaskParameterTaskItem constructor before serialization, so it survives the cross-process boundary. Also includes related fixes to ProjectItemInstance.cs: - Preserve _projectDirectory in copy constructor and serialization - Fix CreateItem(string, string, string) to use includeBeforeWildcardExpansionEscaped related: #3121 fixes #13140
…: Build ID 13255870 (#13208) This is the pull request automatically created by the OneLocBuild task in the build process to check-in localized files generated based upon translation source files (.lcl files) handed-back from the downstream localization pipeline. If there are issues in translations, visit https://aka.ms/icxLocBug and log bugs for fixes. The OneLocBuild wiki is https://aka.ms/onelocbuild and the localization process in general is documented at https://aka.ms/AllAboutLoc. --------- Co-authored-by: Jan Provazník <janprovaznik@microsoft.com>
…ings (#13148) ### Context The team was discussing nodemodes in sync this morning and we were all getting confused. This PR gives names to the existing modes and has us use them consistently across the codebase for clarity. --------- Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com> Co-authored-by: Jan Provazník <janprovaznik@microsoft.com>
This partially reverts commit bc0dc67. Dependabot does now support `Directory.Packages.props`, but still seems to require a project file to be present: ``` updater | 2026/02/09 01:52:58 INFO Discovering build files in workspace [/home/dependabot/dependabot-updater/repo/eng/dependabot]. updater | 2026/02/09 01:52:58 INFO No dotnet-tools.json file found. updater | 2026/02/09 01:52:58 INFO Discovered [global.json] file. updater | 2026/02/09 01:52:58 INFO Discovering projects beneath [eng/dependabot]. updater | 2026/02/09 01:52:58 INFO No project files found. updater | 2026/02/09 01:52:58 INFO Discovery complete. ``` From https://github.com/dotnet/msbuild/actions/runs/21809650562/job/62919159519#step:3:174
In our environment, the hosted pools are swamped and we often queue for half an hour or more waiting for a machine. Switch to machines managed by the .NET team in the hopes that they'll have more capacity and thus lower PR turnaround time.
I got annoyed when reviewing some of the tests Copilot generated in a few recent PRs, so let's try to formalize some of the guidance it wasn't picking up on.
…ns (#13235) Add `MockLogger(_output)` to `TransactionalModePreservesAllData` so build diagnostics appear in test output. Update `tests.instructions.md` to require MockLogger for tests that build via ProjectCollection or BuildManager.
…: Build ID 13287778 (#13236) This is the pull request automatically created by the OneLocBuild task in the build process to check-in localized files generated based upon translation source files (.lcl files) handed-back from the downstream localization pipeline. If there are issues in translations, visit https://aka.ms/icxLocBug and log bugs for fixes. The OneLocBuild wiki is https://aka.ms/onelocbuild and the localization process in general is documented at https://aka.ms/AllAboutLoc. Co-authored-by: YuliiaKovalova <95473390+YuliiaKovalova@users.noreply.github.com>
Fixes # ### Context On traces we could see about 8 mb of allocations due to add nonempty strings. When taking it look at the code we could see alot of adding to a list which was constantly resizing.  ### Changes Made * Instead of defaulting a list of default size we use context to initialize the size to be something closer to what we would like. So that the capacity only needs to be set once throughout its use. * Also reuse the same list throughout instead of using a new one every time. ### Testing After the changes I took a trace and it shows that this path is has been reduced by 5mb.  ### Notes --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
These scripts are missing the executable bit compared to their counterparts in dotnet/arcade: - eng/common/native/install-dependencies.sh - eng/common/sdk-task.sh - eng/common/cross/tizen-build-rootfs.sh - eng/common/cross/install-debs.py - eng/common/cross/tizen-fetch.sh - eng/common/vmr-sync.ps1 - eng/common/dotnet.sh - eng/common/SetupNugetSources.sh - eng/common/vmr-sync.sh This matches the permissions set in arcade.
Coordinated change with Visual Studio. --------- Co-authored-by: Rainer Sigwald <raines@microsoft.com>
### Context `RemoveRelativeSegments` does the same as `AbsolutePath.GetCanonicalForm()`. There is no need for code duplication. ### Testing Unit tests
…iscovery Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
…for parsing Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
…al tests Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
…posal Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
- Rebased all commits on top of PR #13256 - Updated CountSystemWideActiveNodes to use improved node detection - Now filters by NodeMode.OutOfProcNode for accurate worker node counting - Automatically handles dotnet processes and MSBuild.dll filtering - Leverages cross-platform process command line retrieval Co-authored-by: baronfel <573979+baronfel@users.noreply.github.com>
Contributor
Author
Rebased on PR #13256 and integrated the improved node detection logic. Key improvements in commit f6e3056:
The over-provisioning detection now uses the more robust node discovery mechanism, providing better consistency across platforms and more accurate identification of worker nodes vs. main MSBuild processes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Work item (Internal use):
Summary
When multiple MSBuild instances run concurrently, each spawns worker nodes up to
/mlimit (defaults to NUM_PROCS), causing over-provisioning. 25+ nodes can linger for ~15 minutes consuming resources.Modified
NodeProviderOutOfProcBase.ShutdownConnectedNodesto:NodeBuildComplete(PrepareForReuse=false)to excess nodes, keeping only those needed to meet thresholdImplementation:
DetermineNodesForReuse()- Core logic for selective reuse based on system-wide countGetNodeReuseThreshold()- Virtual method returning threshold (default:max(1, cores/2))CountSystemWideActiveNodes()- Virtual method using improved node detection from PR Improve cross-platform node discovery for reuse with NodeMode filtering #13256NodeMode.OutOfProcNodefor accurate worker node countingIntegration with PR #13256:
This PR is rebased on top of PR #13256 ("Improve cross-platform node discovery for reuse with NodeMode filtering") and leverages its improved node detection logic for more accurate and consistent worker node identification across platforms.
Server and RAR nodes use separate lifecycle management and are not affected by this change.
Customer Impact
Reduces resource consumption in scenarios with concurrent builds (DevKit, LLM agents, CI/CD pipelines). System will maintain reasonable node count instead of accumulating 25+ lingering nodes.
Improved accuracy in node counting through NodeMode-based filtering ensures only actual worker nodes are counted, preventing false positives from main MSBuild processes or task hosts.
Regression?
Behavior change only when
enableReuse=true(default). Conservative implementation with try/catch fallback if process enumeration fails.Testing
8 unit tests added covering:
All existing tests pass. Tests remain valid with improved node detection as they use mocked implementations.
Risk
Low. Changes isolated to node shutdown path when reuse is enabled. Failure to enumerate processes defaults to keeping all nodes (current behavior). Diagnostic traces added for troubleshooting.
The integration with PR #13256's improved node detection reduces risk by providing more accurate node identification across platforms and better handling of dotnet processes.
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.