Skip to content

JobMonitor: Replace 'operatingSystem'-property workaround once Helix details API exposes DockerTag/QueueAlias #16964

@mmitche

Description

@mmitche

Background

When the JobMonitor resubmits failed Helix work items, it must recreate the original job''s execution environment — specifically the DockerTag and QueueAlias that were used for the original submission.

The Helix job-details API (Job.DetailsAsync) currently returns only the resolved QueueId — the Docker tag and queue alias are stripped on the way out. As a workaround, the JobMonitor reconstructs the original (queueId, dockerTag, queueAlias) tuple by reading the operatingSystem property that the Helix SDK stamps onto the job (the verbatim (alias)queueId@dockerTag target-queue string) and re-parsing it with the same logic the fresh submission path uses (JobDefinition.ParseQueueId).

This workaround is isolated in a single method, ResolveTargetQueueSpec, in src/Microsoft.DotNet.Helix/JobMonitor/Services/HelixService.cs.

Why this is a workaround

  • The operatingSystem property is only present because the SDK happens to stamp it (Microsoft.DotNet.Helix.Sdk/MonoQueue.targets). It is a display/diagnostic property, not a contract for the resubmission path.
  • For jobs submitted via the modern 2019-06-17 Helix API, the server does not inject operatingSystem/dockerTag into the properties bag — the alias/docker tag live only in dedicated SQL columns that the details API does not expose.

The proper fix

The Helix details API is being updated to return DockerTag and QueueAlias as first-class fields (dnceng-internal dotnet-helix-service PR 61770). Once that ships and the generated Microsoft.DotNet.Helix.Client is updated to surface JobDetails.DockerTag / JobDetails.QueueAlias:

  1. Replace the body of ResolveTargetQueueSpec in HelixService.cs to read details.DockerTag / details.QueueAlias directly.
  2. Remove the operatingSystem-property parsing, the ParseQueueId helper, and the s_queueAliasRegex field if they are no longer used.
  3. Update HelixServiceTests accordingly.

Acceptance criteria

  • JobMonitor resubmission derives DockerTag/QueueAlias from the details API response, not the operatingSystem property.
  • The operatingSystem workaround code is removed from HelixService.cs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions