Skip to content

Conversation

@JanProvaznik
Copy link
Member

Context

TaskHost (OutOfProcTaskHostNode) doesn't support IBuildEngine callbacks - tasks that call
IsRunningMultipleNodes, BuildProjectFile, RequestCores, etc. fail with MSB5022 or
NotImplementedException. This blocks multithreaded mode (-mt) where non-thread-safe tasks
are forced to run in TaskHost.

This PR is Stage 1 of implementing callback support. It establishes the infrastructure and
implements IsRunningMultipleNodes as a proof-of-concept for the pattern.

Partial fix for #12863

Changes Made

Packet Infrastructure:

- Add packet type enum values (0x20-0x27 range) for TaskHost callbacks
- Create ITaskHostCallbackPacket interface with RequestId for request/response

correlation
- Add TaskHostQueryRequest/Response packets for property queries

OutOfProcTaskHostNode (TaskHost side):

- Add _pendingCallbackRequests dictionary mapping RequestId → TaskCompletionSource
- Add SendCallbackRequestAndWaitForResponse<T>() - sends request, blocks task thread

until response
- Add HandleCallbackResponse() - routes incoming responses to waiting TCS by RequestId
- Implement IsRunningMultipleNodes to forward query to parent instead of logging error

TaskHostTask (Parent side):

- Register handler for TaskHostQueryRequest packets
- Forward IsRunningMultipleNodes query to real IBuildEngine and return result

Testing

- Added TaskHostQueryPacket_Tests.cs - packet serialization round-trip
- Added IsRunningMultipleNodesTask.cs + integration test in TaskHostFactory_Tests.cs
- All 100 TaskHost tests pass

Notes

Callback Flow:

sequenceDiagram
        participant Task as Task Thread<br/>(TaskHost)
        participant Main as Main Thread<br/>(TaskHost)
        participant Parent as TaskHostTask<br/>(Parent Process)

        Task->>Task: BuildEngine.IsRunningMultipleNodes
        Task->>Task: Create TaskHostQueryRequest(requestId)
        Task->>Task: Store TCS in _pendingCallbackRequests[requestId]
        Task->>Parent: SendData(request)
        Task-->>Task: Block on TCS.Task

        Parent->>Parent: HandleQueryRequest(request)
        Parent->>Parent: result = _buildEngine.IsRunningMultipleNodes
        Parent->>Main: SendData(TaskHostQueryResponse)

        Main->>Main: HandleCallbackResponse(response)
        Main->>Main: _pendingCallbackRequests[requestId].SetResult(response)

        Task-->>Task: TCS unblocks
        Task->>Task: return response.BoolResult
Loading

Thread Model:

- Main thread runs packet dispatch loop - receives responses, routes to waiting tasks
- Task thread(s) execute task code - block on TCS when calling callbacks

Error Handling:

- Connection loss cancels all pending callbacks with TaskCanceledException
- New error string MSB5025: TaskHostCallbackConnectionLost

Next Stages:

- Stage 2: RequestCores/ReleaseCores
- Stage 3: BuildProjectFile/BuildProjectFilesInParallel
- Stage 4: Yield/Reacquire

Implements callback infrastructure for TaskHost IBuildEngine callbacks:
- Add packet type enum values for TaskHost callbacks (0x20-0x27 range)
- Create ITaskHostCallbackPacket interface for request/response correlation
- Add TaskHostQueryRequest/Response packets for property queries
- Implement IsRunningMultipleNodes forwarding in OutOfProcTaskHostNode
- Handle query requests in TaskHostTask (parent side)
- Add integration tests for IsRunningMultipleNodes callback

This enables tasks running in TaskHost to correctly query whether
the build is running with multiple nodes, which is needed for
multithreaded mode (-mt) support.

Partial fix for dotnet#12863
@JanProvaznik JanProvaznik force-pushed the ibuildengine-callbacks-stage1 branch from 3c57f6f to 684a479 Compare January 29, 2026 18:07
@JanProvaznik JanProvaznik changed the title Stage 1: Packet infrastructure + IsRunningMultipleNodes callback [IBuildEngine callbacks] Stage 1: Packet infrastructure + IsRunningMultipleNodes Jan 29, 2026
@JanProvaznik JanProvaznik marked this pull request as ready for review January 29, 2026 18:11
@JanProvaznik
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Stage 1 of enabling IBuildEngine callback support for out-of-proc TaskHost execution by introducing request/response packet infrastructure and implementing IBuildEngine2.IsRunningMultipleNodes via a TaskHost→parent query/response roundtrip.

Changes:

  • Added TaskHost callback packet types and correlation interface (ITaskHostCallbackPacket) plus query request/response packets.
  • Implemented IsRunningMultipleNodes forwarding from TaskHost to parent (OutOfProcTaskHostNodeTaskHostTask).
  • Added unit + integration tests validating packet serialization and TaskHost callback behavior.

Reviewed changes

Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/Shared/TaskHostQueryResponse.cs Adds response packet for TaskHost query callbacks.
src/Shared/TaskHostQueryRequest.cs Adds request packet and query enum for simple callback queries.
src/Shared/ITaskHostCallbackPacket.cs Defines request/response correlation contract via RequestId.
src/Shared/INodePacket.cs Reserves and defines new NodePacketType values for TaskHost callbacks.
src/Samples/TaskHostCallback/TestIsRunningMultipleNodesTask.cs Sample task exercising IBuildEngine2.IsRunningMultipleNodes in TaskHost.
src/Samples/TaskHostCallback/TestIsRunningMultipleNodes.proj Sample project wiring task via TaskHostFactory.
src/Samples/TaskHostCallback/TaskHostCallback.csproj Builds the sample TaskHost callback task assembly.
src/MSBuild/Resources/xlf/Strings.zh-Hant.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.zh-Hans.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.tr.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.ru.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.pt-BR.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.pl.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.ko.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.ja.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.it.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.fr.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.es.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.de.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/xlf/Strings.cs.xlf Adds localized entry for new TaskHost connection-loss string.
src/MSBuild/Resources/Strings.resx Adds new TaskHost connection-loss resource string.
src/MSBuild/OutOfProcTaskHostNode.cs Implements callback request plumbing + forwards IsRunningMultipleNodes to parent.
src/MSBuild/MSBuild.csproj Includes new Shared callback packet source files in MSBuild build.
src/Build/Microsoft.Build.csproj Includes new Shared callback packet source files in Microsoft.Build build.
src/Build/Instance/TaskFactories/TaskHostTask.cs Handles TaskHost query requests and replies with forwarded results.
src/Build.UnitTests/BackEnd/TaskHostQueryPacket_Tests.cs Adds roundtrip serialization tests for query request/response packets.
src/Build.UnitTests/BackEnd/TaskHostFactory_Tests.cs Adds integration test verifying IsRunningMultipleNodes works from TaskHost.
src/Build.UnitTests/BackEnd/IsRunningMultipleNodesTask.cs Adds test task used by integration test to call IsRunningMultipleNodes.

Comment on lines +816 to +841
// Wait for either: response arrives, task cancelled, or connection lost
// No timeout - callbacks like BuildProjectFile can legitimately take hours
WaitHandle[] waitHandles = [responseEvent, _taskCancelledEvent];

while (true)
{
int signaledIndex = WaitHandle.WaitAny(waitHandles, millisecondsTimeout: 1000);

if (signaledIndex == 0)
{
// Response received
break;
}
else if (signaledIndex == 1)
{
// Task cancelled
throw new BuildAbortedException();
}

// Timeout - check connection status (no WaitHandle available for this)
if (_nodeEndpoint.LinkStatus != LinkStatus.Active)
{
throw new InvalidOperationException(
ResourceUtilities.FormatResourceStringStripCodeAndKeyword("TaskHostCallbackConnectionLost"));
}
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 1-second WaitAny polling loop will wake up blocked task threads repeatedly even for long-running callbacks (potentially hours), which can add unnecessary CPU wakeups at scale. Consider introducing a wait handle/signaling path for link failure (e.g., set from OnLinkStatusChanged) so the wait can be indefinite without polling.

Suggested change
// Wait for either: response arrives, task cancelled, or connection lost
// No timeout - callbacks like BuildProjectFile can legitimately take hours
WaitHandle[] waitHandles = [responseEvent, _taskCancelledEvent];
while (true)
{
int signaledIndex = WaitHandle.WaitAny(waitHandles, millisecondsTimeout: 1000);
if (signaledIndex == 0)
{
// Response received
break;
}
else if (signaledIndex == 1)
{
// Task cancelled
throw new BuildAbortedException();
}
// Timeout - check connection status (no WaitHandle available for this)
if (_nodeEndpoint.LinkStatus != LinkStatus.Active)
{
throw new InvalidOperationException(
ResourceUtilities.FormatResourceStringStripCodeAndKeyword("TaskHostCallbackConnectionLost"));
}
}
// Wait for either: response arrives or task cancelled.
// Indefinite wait to avoid periodic wakeups for long-running callbacks.
WaitHandle[] waitHandles = [responseEvent, _taskCancelledEvent];
int signaledIndex = WaitHandle.WaitAny(waitHandles);
if (signaledIndex == 0)
{
// Response received
}
else if (signaledIndex == 1)
{
// Task cancelled
throw new BuildAbortedException();
}
else
{
// Defensive: unexpected result from WaitAny
throw new InvalidOperationException(
$"Unexpected wait result in {nameof(OutOfProcTaskHostNode)} callback: index={signaledIndex}");
}

Copilot uses AI. Check for mistakes.
Comment on lines +836 to +840
if (_nodeEndpoint.LinkStatus != LinkStatus.Active)
{
throw new InvalidOperationException(
ResourceUtilities.FormatResourceStringStripCodeAndKeyword("TaskHostCallbackConnectionLost"));
}
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description states that connection loss cancels pending callbacks with TaskCanceledException, but this implementation throws InvalidOperationException on link loss (and there’s no centralized cancellation of all pending requests). Either update the PR description or adjust the implementation to match the documented behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +1729 to +1731
<data name="TaskHostCallbackConnectionLost" xml:space="preserve">
<value>MSB5025: TaskHost lost connection to parent process during callback. The build may have been cancelled.</value>
<comment>{StrBegin="MSB5025: "}</comment>
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new TaskHostCallbackConnectionLost resource uses error code MSB5025, but MSB5025 is already used elsewhere (e.g., src/Shared/Resources/Strings.shared.resx for solution filter JSON errors). Please assign an unused MSB number to avoid duplicate error codes (and update corresponding localized xlf entries accordingly).

Copilot uses AI. Check for mistakes.
Comment on lines +805 to +809
// Use ManualResetEvent to bridge TaskCompletionSource to WaitHandle
using var responseEvent = new ManualResetEvent(false);
var tcs = new TaskCompletionSource<INodePacket>(TaskCreationOptions.RunContinuationsAsynchronously);
tcs.Task.ContinueWith(_ => responseEvent.Set(), TaskContinuationOptions.ExecuteSynchronously);
_pendingCallbackRequests[requestId] = tcs;
Copy link

Copilot AI Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SendCallbackRequestAndWaitForResponse allocates a ManualResetEvent and a Task continuation just to get a WaitHandle. This adds kernel handle + continuation overhead per callback. Consider waiting on the Task’s built-in AsyncWaitHandle (via IAsyncResult) or restructuring to avoid the extra event/continuation.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant