Skip to content

Commit 8846fd5

Browse files
Antriksh JainCopilot
andcommitted
feat(azure.ai.agents): scaffold doctor remote-check pipeline (P5.1 C10)
Wire the Cobra/runner pipe for remote (network-dependent) doctor checks so commits C11-C17 can add individual checks (auth, foundry endpoint reachability, RBAC, agent status) without touching the doctor command's Cobra surface or runner internals. Most of C10's framework was already in place from Phase 4: - `Check.Remote bool` (runner.go:32-37) - `Runner.Run` skips remote checks under `Options.LocalOnly` (runner.go:74-82) - `report.Remote = true` flipped whenever an executed check is Remote (runner.go:128-130) - `--local-only` / `--unredacted` flags bound on the Cobra surface and threaded through `doctor.Options` What was missing was the *factory slot* — a `NewRemoteChecks` mirror of `NewLocalChecks` that the doctor command appends to its check list. Without that slot the only place to add a remote check was the command file itself, breaking the convention established by `NewLocalChecks` (every doctor check lives in the `doctor` package; the command file is plumbing only). Changes ------- * New `internal/cmd/doctor/checks_remote.go`: - `NewRemoteChecks(deps Dependencies) []Check` — empty today; documents the conventions C11+ remote checks must follow (Remote: true, ctx cancellation, skip-cascade against the local chain via `priorBlocked`, redaction discipline under `!Options.Unredacted`). - Names the four follow-up checks the slot is reserved for so a future reader can immediately see the scope without cross-referencing the plan: C11 auth, C12 foundry endpoint, C16 RBAC, C17 agent status. - Explicitly documents that local-checks-then-remote-checks ordering is load-bearing for `priorBlocked` skip-cascade. * `internal/cmd/doctor.go`: - `runDoctor` now builds the runner from `append(NewLocalChecks(deps), NewRemoteChecks(deps)...)` instead of `NewLocalChecks(deps)` alone. Comment on the call site pins the ordering contract. - `doctorFlags` doc comment rewritten to describe today's reality (the wire is fully exercised; remote-checks factory is empty but populated transparently when C11+ land) instead of the pre-C10 wording "no-op today, reserved for an upcoming pass." - `--local-only` and `--unredacted` user-visible help text trimmed of internal plan-tracking jargon (no more "subsequent commits" or "(P5 C11+)"). The first sentence now stands on its own and reads cleanly under `--help`. * `internal/cmd/doctor/types.go`: - `Options.LocalOnly` doc comment updated from "no-op in phase 4 — no remote checks are wired yet" to match the new post-C10 reality (the factory returns empty today; the wire is exercised). * New `internal/cmd/doctor/checks_remote_test.go` (5 tests): - `TestNewRemoteChecks_EmptyTodayButCallable` — pins the contract; when C11 lands the empty-slice assertion fires and forces the author to update the count. - `TestNewLocalAndRemoteChecks_ProductionCompositionLocalsFirst` — pins the load-bearing local-then-remote ordering by reading the actual production factories (not synthesized checks). Asserts (a) every check in NewLocalChecks has Remote=false, (b) every check in NewRemoteChecks has Remote=true, (c) no local check appears after any remote check in the combined slice. Catches a future contributor swapping the append order or forgetting the Remote flag on a remote check. - `TestRunner_LocalThenRemote_RemoteSeesLocalPriorResults` — asserts the runner preserves the slice order so a remote check's `priorBlocked` guard reads the local results. - `TestRunner_LocalOnly_AppendedRemoteCheck_NotInvoked` — exercises the production-shaped `append(local, remote...)` slice under `LocalOnly: true`. - `TestRunner_RemoteCheck_RanProducesReportRemoteFlag` — asserts `report.Remote = true` against the production-shaped slice when a remote check executes. User-visible behavior --------------------- None. The remote-checks factory is empty, so: - `azd ai agent doctor` produces the same 7-local-check report it did before this commit. - `azd ai agent doctor --local-only` produces the same 7-check report (no remote checks to skip). - `azd ai agent doctor --output json` envelope has `"remote": false` (no remote check executed). The change is exclusively in the *plumbing* for the follow-up commits. Preflight --------- * gofmt -s -w . — clean * go vet ./... — clean * go build ./... — clean * go test ./... -count=1 — green (cmd 15.0s, doctor 6.1s, nextstep 3.7s, etc.) * golangci-lint run ./internal/cmd/... — 0 issues * npx cspell lint "internal/cmd/doctor.go" "internal/cmd/doctor/**/*.go" --relative --config ../../.vscode/cspell.yaml --no-progress — 0 issues across 7 files Closes Phase 5 commit slot C10. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent a85dcfd commit 8846fd5

4 files changed

Lines changed: 288 additions & 19 deletions

File tree

cli/azd/extensions/azure.ai.agents/internal/cmd/doctor.go

Lines changed: 23 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,23 @@ import (
1919

2020
// doctorFlags are the Cobra-bound flags for `azd ai agent doctor`.
2121
//
22-
// localOnly is exposed today as a no-op: every shipped check is local
23-
// (Phase 4 covers checks 1–6). The Cobra surface is locked early so the
24-
// Phase 5 follow-up that adds remote checks does not need to introduce
25-
// the flag in the same commit as the new check implementations.
22+
// localOnly skips remote (network-dependent) checks. The runner gates
23+
// remote checks via the Check.Remote field (see runner.go); doctor
24+
// remains responsive when network is unreachable, behind a proxy, or
25+
// the user just wants a fast local triage. Today the remote-checks
26+
// factory returns an empty slice, so the flag has no observable
27+
// effect — but the wire is fully exercised so the remote checks land
28+
// transparently.
2629
//
2730
// output selects the rendering path: "text" (default, human-readable
2831
// with a trailing Next: block on success) or "json" (structured envelope
2932
// for scripted consumers).
3033
//
31-
// unredacted is reserved for Phase 5 — once remote checks surface
32-
// principal IDs, scope ARNs, and UPNs, this flag will toggle the
33-
// redaction layer. It is bound today and threaded into doctor.Options
34-
// so that callers (and tests) can already exercise the wire without
35-
// the future Phase 5 fix-up touching the Cobra surface.
34+
// unredacted toggles the redaction of principal IDs, scope ARNs, and
35+
// UPNs in the report. The flag is surfaced today and threaded into
36+
// doctor.Options so remote checks can read `opts.Unredacted` from
37+
// their CheckFunc signature; the redaction layer itself lands with
38+
// the first check that produces sensitive identifiers.
3639
type doctorFlags struct {
3740
localOnly bool
3841
output string
@@ -123,8 +126,8 @@ Exit codes:
123126

124127
cmd.Flags().BoolVar(
125128
&flags.localOnly, "local-only", false,
126-
"Run only local checks (no network calls). "+
127-
"All checks are local today; this flag is reserved for an upcoming remote-checks pass.",
129+
"Skip remote (network-dependent) checks. "+
130+
"Useful when offline, behind a proxy, or for a fast local triage.",
128131
)
129132
cmd.Flags().StringVarP(
130133
&flags.output, "output", "o", "text",
@@ -133,7 +136,7 @@ Exit codes:
133136
cmd.Flags().BoolVar(
134137
&flags.unredacted, "unredacted", false,
135138
"Show raw principal IDs, scope ARNs, and UPNs in the report. "+
136-
"Reserved for the upcoming remote-checks pass (no-op today).",
139+
"Has no effect today; takes effect when remote checks are added.",
137140
)
138141

139142
return cmd
@@ -171,7 +174,14 @@ func runDoctor(
171174
opts doctor.Options,
172175
azdClient *azdext.AzdClient,
173176
) (doctor.Report, []nextstep.Suggestion) {
174-
runner := doctor.Runner{Checks: doctor.NewLocalChecks(deps)}
177+
// Local checks run first so their Results are available to
178+
// remote checks' skip-cascade guards (each remote check inspects
179+
// `prior []Result` via `priorBlocked` to decide whether to skip
180+
// when an upstream local precondition failed). The slice order
181+
// here is the source of truth for that contract — do not
182+
// reorder.
183+
checks := append(doctor.NewLocalChecks(deps), doctor.NewRemoteChecks(deps)...)
184+
runner := doctor.Runner{Checks: checks}
175185
report := runner.Run(ctx, opts)
176186

177187
// Trailing Next: block is only meaningful when checks all pass
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
// Copyright (c) Microsoft Corporation. All rights reserved.
2+
// Licensed under the MIT License.
3+
4+
package doctor
5+
6+
// NewRemoteChecks returns the canonical sequence of remote (network-
7+
// dependent) doctor checks in execution order. Today the slice is
8+
// empty — the framework is wired through `--local-only`, the runner's
9+
// `Remote: true` gating (runner.go:74-82), and `report.Remote` (set
10+
// when any executed check is Remote) so that downstream commits (P5
11+
// C11 / C12 / C16 / C17) can append individual checks without
12+
// touching the doctor command's Cobra wiring.
13+
//
14+
// # Conventions for remote checks added in C11+
15+
//
16+
// - Set Remote: true on the Check value. The runner uses this both
17+
// to skip the check under --local-only and to flip
18+
// report.Remote = true when the check runs (used by the JSON
19+
// envelope and the formatter's "remote checks were exercised"
20+
// decisions).
21+
// - Forward the `Dependencies` struct to each check's closure. C11+
22+
// checks that require auth credentials or a REST client should
23+
// add those fields to `Dependencies` (defined in checks_local.go)
24+
// and document them there. Tests inject fakes via the same fields
25+
// that production wiring populates from the Cobra surface.
26+
// - Skip-cascade against the local chain. Most remote checks
27+
// require at least:
28+
// - `local.grpc-extension` to have produced an AzdClient
29+
// - `local.azure-yaml` for the project root
30+
// - `local.environment-selected` for the active azd env name
31+
// - `local.project-endpoint-set` for AZURE_AI_PROJECT_ENDPOINT
32+
// Guard with one or more `priorBlocked(prior, "<id>")` calls and
33+
// return Result{Status: StatusSkip, Message: "..."}. Doing the
34+
// work inside the check (rather than in the runner) keeps the
35+
// skip-message specific to the inherited failure so users see a
36+
// pointed suggestion instead of a generic "upstream check failed".
37+
// - Honor ctx cancellation. Remote checks own a network round trip;
38+
// the runner only checks ctx.Err between checks, so a long-blocked
39+
// HTTP call would otherwise stall a Ctrl-C.
40+
// - When Unredacted is false (the default), elide raw principal IDs
41+
// / scope ARNs / UPNs from the Message. The full payload still
42+
// goes into Details for callers that opt in via --unredacted.
43+
//
44+
// # Ordering relative to local checks
45+
//
46+
// In `doctor.go:runDoctor`, remote checks are appended AFTER all
47+
// local checks. This is deliberate: every remote check's skip-cascade
48+
// reads `prior []Result`, and the local results must be available in
49+
// that slice when the remote check runs. The runner's loop preserves
50+
// the order of `Runner.Checks`, so appending remote-after-local is
51+
// sufficient.
52+
func NewRemoteChecks(deps Dependencies) []Check {
53+
// Phase 5 commits C11-C17 will append entries here:
54+
// - C11: auth probe (`remote.auth`)
55+
// - C12: foundry project endpoint reachability (`remote.foundry-endpoint`)
56+
// - C16: RBAC permissions (`remote.rbac`)
57+
// - C17: agent status on backend (`remote.agent-status`)
58+
// Until those land the slice is empty; the framework is fully
59+
// exercised by tests using injected fake remote checks. `deps` is
60+
// named (rather than `_`) so the production call site reads
61+
// naturally and future contributors see the param contract; Go
62+
// does not flag unused function parameters.
63+
return []Check{}
64+
}
Lines changed: 194 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,194 @@
1+
// Copyright (c) Microsoft Corporation. All rights reserved.
2+
// Licensed under the MIT License.
3+
4+
package doctor
5+
6+
import (
7+
"context"
8+
"testing"
9+
10+
"github.com/stretchr/testify/require"
11+
)
12+
13+
// ---- NewRemoteChecks contract ----
14+
15+
func TestNewRemoteChecks_EmptyTodayButCallable(t *testing.T) {
16+
// Today the function returns an empty slice — remote checks land
17+
// in P5 commits C11-C17. This test pins the contract so a future
18+
// reviewer can immediately see that an empty result is intentional
19+
// (not an accidental wipe) and so that the production wiring in
20+
// doctor.go can build the runner unconditionally without a nil
21+
// check. A panic in NewRemoteChecks would also fail this test (the
22+
// direct call below has no recover); no separate panic-guard test
23+
// is needed.
24+
t.Parallel()
25+
26+
got := NewRemoteChecks(Dependencies{})
27+
28+
require.NotNil(t, got, "NewRemoteChecks must return a non-nil slice "+
29+
"(empty is allowed) so doctor.go can append unconditionally")
30+
require.Empty(t, got, "NewRemoteChecks must return zero checks "+
31+
"until the first remote check lands in P5 C11+")
32+
}
33+
34+
// TestNewLocalAndRemoteChecks_ProductionCompositionLocalsFirst pins the
35+
// load-bearing local-then-remote ordering that doctor.go:runDoctor
36+
// composes via `append(NewLocalChecks, NewRemoteChecks...)`. Without
37+
// this test, a future contributor could accidentally swap the
38+
// composition order (or land a remote check inside NewLocalChecks /
39+
// vice versa) and every other existing test would still pass, while
40+
// remote checks' `priorBlocked(prior, "local.X")` skip-cascade guards
41+
// would silently always return false.
42+
//
43+
// We assert two invariants on the production composition:
44+
//
45+
// 1. No local check (Remote == false) appears AFTER any remote check.
46+
// Locals must run first so their results are in `prior` when remote
47+
// checks evaluate `priorBlocked`.
48+
// 2. Every check returned by NewRemoteChecks carries Remote == true
49+
// (the same convention bullet documented in checks_remote.go).
50+
// Forgetting the flag would cause the runner to (a) not skip the
51+
// check under --local-only and (b) not flip report.Remote.
52+
func TestNewLocalAndRemoteChecks_ProductionCompositionLocalsFirst(t *testing.T) {
53+
t.Parallel()
54+
55+
locals := NewLocalChecks(Dependencies{})
56+
remotes := NewRemoteChecks(Dependencies{})
57+
58+
for i, c := range locals {
59+
require.Falsef(t, c.Remote,
60+
"NewLocalChecks[%d] %q has Remote=true; locals must declare Remote=false",
61+
i, c.ID)
62+
}
63+
for i, c := range remotes {
64+
require.Truef(t, c.Remote,
65+
"NewRemoteChecks[%d] %q has Remote=false; remotes must declare Remote=true",
66+
i, c.ID)
67+
}
68+
69+
// Invariant 1: combined ordering must place every local before
70+
// every remote. Equivalent to the contract `runDoctor` relies on.
71+
combined := append(locals, remotes...)
72+
sawRemote := false
73+
for _, c := range combined {
74+
if c.Remote {
75+
sawRemote = true
76+
continue
77+
}
78+
require.Falsef(t, sawRemote,
79+
"local check %q appears after a remote check in the "+
80+
"combined doctor pipeline; runDoctor's skip-cascade "+
81+
"contract requires local-then-remote ordering",
82+
c.ID)
83+
}
84+
}
85+
86+
// ---- Framework integration: local + remote interaction ----
87+
88+
// TestRunner_LocalThenRemote_RemoteSeesLocalPriorResults proves the
89+
// runner preserves the order `NewLocalChecks ++ NewRemoteChecks` so a
90+
// remote check's skip-cascade can read the local check results. This
91+
// is the load-bearing contract C11+ remote checks depend on (each one
92+
// calls `priorBlocked(prior, "local.X")` to decide whether to skip).
93+
//
94+
// We don't use the real NewLocalChecks here because that would couple
95+
// this test to the live gRPC stack. Instead we synthesize a local +
96+
// remote pair using the same Check shape and assert the ordering.
97+
func TestRunner_LocalThenRemote_RemoteSeesLocalPriorResults(t *testing.T) {
98+
t.Parallel()
99+
100+
var observed []Result
101+
runner := &Runner{
102+
Checks: append(
103+
[]Check{
104+
{ID: "local.x", Name: "local x", Fn: func(_ context.Context, _ Options, _ []Result) Result {
105+
return Result{Status: StatusFail, Message: "local x failed"}
106+
}},
107+
},
108+
Check{
109+
ID: "remote.y",
110+
Name: "remote y",
111+
Remote: true,
112+
Fn: func(_ context.Context, _ Options, prior []Result) Result {
113+
observed = append([]Result(nil), prior...)
114+
// Mirror the convention C11+ checks will follow:
115+
// inspect prior, skip when a local precondition
116+
// failed.
117+
if priorBlocked(prior, "local.x") {
118+
return Result{Status: StatusSkip, Message: "skipped: upstream local.x"}
119+
}
120+
return Result{Status: StatusPass, Message: "remote y ran"}
121+
},
122+
},
123+
),
124+
}
125+
126+
report := runner.Run(t.Context(), Options{})
127+
128+
require.Len(t, observed, 1, "remote check must see exactly the one local prior result")
129+
require.Equal(t, "local.x", observed[0].ID)
130+
require.Equal(t, StatusFail, observed[0].Status)
131+
require.Equal(t, StatusSkip, report.Checks[1].Status, "remote check should have skipped via priorBlocked")
132+
require.Contains(t, report.Checks[1].Message, "upstream local.x")
133+
}
134+
135+
// TestRunner_LocalOnly_RemoteCheckNotInvoked complements the runner's
136+
// existing TestRunner_Run_LocalOnly_SkipsRemoteChecks by exercising the
137+
// combination used by the doctor command in production:
138+
// `append(NewLocalChecks, NewRemoteChecks...)`. We synthesize a remote
139+
// check that would Fail if invoked, then assert it produces a Skip
140+
// without running.
141+
func TestRunner_LocalOnly_AppendedRemoteCheck_NotInvoked(t *testing.T) {
142+
t.Parallel()
143+
144+
invoked := false
145+
checks := append(
146+
[]Check{
147+
{ID: "local.x", Name: "local x", Fn: func(_ context.Context, _ Options, _ []Result) Result {
148+
return Result{Status: StatusPass, Message: "ok"}
149+
}},
150+
},
151+
Check{
152+
ID: "remote.y", Name: "remote y", Remote: true,
153+
Fn: func(_ context.Context, _ Options, _ []Result) Result {
154+
invoked = true
155+
return Result{Status: StatusFail, Message: "remote check ran when it should not have"}
156+
},
157+
},
158+
)
159+
160+
runner := &Runner{Checks: checks}
161+
report := runner.Run(t.Context(), Options{LocalOnly: true})
162+
163+
require.False(t, invoked, "remote check function must not be invoked under --local-only")
164+
require.Len(t, report.Checks, 2)
165+
require.Equal(t, StatusPass, report.Checks[0].Status)
166+
require.Equal(t, StatusSkip, report.Checks[1].Status)
167+
require.Contains(t, report.Checks[1].Message, "local-only")
168+
require.False(t, report.Remote, "report.Remote must remain false when only local checks executed")
169+
}
170+
171+
// TestRunner_RemoteCheck_RanProducesReportRemoteFlag mirrors the
172+
// existing TestRunner_Run_RemoteCheck_FlipsReportRemoteFlag but
173+
// scoped to the combined local+remote shape used in production.
174+
func TestRunner_RemoteCheck_RanProducesReportRemoteFlag(t *testing.T) {
175+
t.Parallel()
176+
177+
checks := append(
178+
[]Check{
179+
{ID: "local.x", Name: "local x", Fn: func(_ context.Context, _ Options, _ []Result) Result {
180+
return Result{Status: StatusPass}
181+
}},
182+
},
183+
Check{
184+
ID: "remote.y", Name: "remote y", Remote: true,
185+
Fn: func(_ context.Context, _ Options, _ []Result) Result {
186+
return Result{Status: StatusPass}
187+
},
188+
},
189+
)
190+
191+
report := (&Runner{Checks: checks}).Run(t.Context(), Options{})
192+
193+
require.True(t, report.Remote, "any executed remote check must flip report.Remote")
194+
}

cli/azd/extensions/azure.ai.agents/internal/cmd/doctor/types.go

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -103,12 +103,13 @@ type Report struct {
103103
}
104104

105105
// Options are the runtime flags that influence the runner. LocalOnly
106-
// excludes any check whose Remote field is true (no-op in phase 4 — no
107-
// remote checks are wired yet; the field is exposed early so the Cobra
108-
// surface can be locked without churn when phase 5 lands). Unredacted
109-
// inverts Redacted on the produced Report; it is also surfaced to checks
110-
// that decide whether to include identifiers in their Message / Details
111-
// strings.
106+
// excludes any check whose Remote field is true. Today the remote-
107+
// checks factory (doctor.NewRemoteChecks) returns an empty slice, so
108+
// the flag has no observable effect; the wire is fully exercised in
109+
// the runner and tests so C11+ remote checks land transparently.
110+
// Unredacted inverts Redacted on the produced Report; it is also
111+
// surfaced to checks that decide whether to include identifiers in
112+
// their Message / Details strings.
112113
type Options struct {
113114
LocalOnly bool
114115
Unredacted bool

0 commit comments

Comments
 (0)