Skip to content

Support a single-node, multithreaded Task execution mode #11801

@baronfel

Description

@baronfel

Today, MSBuild operates as a multi-process system. Builds are submitted to a central coordinator node that performs evaluation of projects, and then farms out actual Target execution, Task invocation, and more on specific worker nodes. These nodes communicate across processes via a custom Inter-Process Communication channel using custom binary serialization of data.

This scheme results in a lot of unnecessary re-done work. DLLs must be loaded per-process/node, Project Evaluation data must be sent back and forth across the IPC channel, and caches are built and maintained per-process/node. (There are more examples of this re-work, but these are a few large examples.) In addition, the multi-proc nature of MSBuild makes it hard to debug and analyze build execution.

Therefore, we propose to implement a new single-node, multi-threaded mode of MSBuild execution. In this mode, Tasks would be invoked on Thread-based worker nodes on the same single execution node. Because more of the worker nodes would be within the same process, we would be able to take advantage of structural sharing/shared memory to reuse more of the data that is stateful across a build. We would also be able to save on some of the JITting costs and reduce the cost of communicating across worker nodes due to no longer needing to serialize and transmit data across the IPC channel.

Task API Contract

Not all Tasks would immediately be able to opt into this mode of execution. Because of the process-isolation execution mode of Tasks today, many Task authors make use of APIs that are not inherently safe in a multi-threaded environment. Such APIs include but are not limited to environment variable access, file I/O, and more. To circumvent these problems, we would need to create a new Task API contract that signals that a Task is intended to operate in a thread-safe execution environment. To help authors obey the rules, we would ship helpers like Banned API Analyzer configuration that prevent authors from using APIs that are known to be in violation of the thread-based execution model. In addition, we would provide a context- or god-object that provides safe versions of anticipated thread-unsafe functionality. The engine would be responsible for preparing this object before Task execution and providing it to the Task. Finally, this new Task API Contract would be able to take advantage of modern .NET expectations for asynchronous operations, like CancellationTokens.

Engine changes and new Node types

To take advantage of the new API, the MSBuild engine would need to create a new Node type and potentially a new TaskHost type that is based on a Thread of execution instead of a process. This thread of execution would set up the execution environment similarly to how the execution environment, construct the thread-aware Task and its relevant context- or god-object instance, run the Task, and then handle returning the results of the Task (including cancellation, failure, success) to the worker Node.

Compatibility

out-of-proc execution

The new Task API doesn't necessarily require multi-threaded execution. As a checkpoint along the implementation journey, when the Task API contract is added we should also extend the current out-of-proc TaskHost implementation with the knowledge of how to spawn these Tasks. This will allow Task authors to target one implementation and run it in either mode, decoupling the distinct units of work of adding multi-threaded Node support to the engine and 'enlightening' Tasks.

out-of-proc fallback

After the new mode is implemented in the engine, we should provide a flag or trait to allow globally forcing the use of the out of proc or in-proc mechanisms for the new Task API contract. This will act as a safety net in the case of bugs in the newer node implementation - in the worst case users can fallback to out-of-proc execution and get the same behaviors as previous versions of MSBuild.

Behavior in the presence of unenlightened Tasks

After the engine work is done, there will still be Tasks that use the old API. When the engine encounters these it should use the out-of-proc node and taskhost mechanisms, to ensure that existing builds continue to function.

Behavior of Tasks that have both API implementations

A Task author may choose to implement both the old and new Task API contracts, especially if they want to continue to ensure compatibility with older versions of MSBuild. In this case, the consumers of the Task must be very specific in their UsingTask declarations to ensure that the Task variant they expect is chosen. We will likely need to surface some knob on UsingTask to make it clear what execution mode a Task should operate in.

Usage scenarios

This mode should be usable both from the MSBuild API and from the MSBuild Command Line.

Risks

Loss of Isolation

Tasks today exist in a bubble because of their external-process, single-threaded nature. Moving to a multi-threaded model means that it's more possible that state from other threads will pollute the executing Task thread. It's very important that we ensure isolation for the new Task variants.

Insufficient performance gains

We anticipate performance gains for most builds once Tasks that ship in MSBuild and the .NET SDK are enlightened and multi-thread capable. However, those gains may not materialize at the magnitude we expect, or other engine-level costs may still dominate, etc. In this case, this work still represents a benefit - there are a number of ways that we can optimize the multi-threaded node that we simply cannot in multi-proc mode, and analysis of the single node will be much easier. This work also is a prereq for more architectural changes we'd like to make in the future that should also unlock more performance.

Current Status

We have done a large swath of initial work on multithreaded across several different parts of the engine that point to the value of the work. Some of these work items are independently useful to other .NET strategic objectives (.NET TaskHost, SDK decoupling/SDK Resolver). The ones that remain are significant in risk and, while logically independent from the majority of the MSBuild Engine, do present some amount of risk for bad user experiences if rushed for GA. Specifically, these risks could present as

  • reduced performance from a partial migration of the build to multithreading mode, resulting in more of the build happening out-of-proc and incurring IPC costs
  • stale/incorrect state on Tasks that were onboarded to multithreaded mode in the absence of the new, safe-by-contract Task APIs
  • and more

As a result, we are taking the following course of action:

  • For 10 GA
    • finalize the new Task API with feedback from .NET API boards and make it available for authors to onboard
    • implement the 'backwards compatibility' mode that makes new Tasks work out of process
    • implement the 'forwards compatibility' mode that makes old Tasks optionally work in-process if they explicitly allow it
    • allow users to trigger multithreaded mode via the -mt switch - NOT as a default pushed by the dotnet CLI
  • For 10.0.200
    • complete onboarding of MSBuild/NuGet/SDK Tasks to multithreaded-aware API
    • enable multithreaded by default in the .NET SDK

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions