Skip to content

Commit a85dcfd

Browse files
Antriksh JainCopilot
andcommitted
feat(azure.ai.agents): add doctor check local.manual-env-vars (P5.1 C9)
Adds the 7th local doctor check, surfacing the same MissingManualVars signal that the post-init `Next:` block renders — but at any point in the flow (post-deploy, mid-debug, before reporting a bug). Closes the "manual config values missing" branch from issue Azure#7975 lines 117-127 on the doctor side. ## Why The doctor's job is to short-circuit user frustration: "I get the same guidance from `azd ai agent doctor` regardless of where I am in the flow." Today the manual-vars signal only surfaces at the tail of `azd ai agent init`, so a user who hits the issue mid-cycle (e.g. they cloned a friend's project, ran `azd env select`, and went straight to `azd ai agent run`) sees no clear pointer to the missing config. ## What - `internal/cmd/doctor/checks_manual_env.go` — new check produces ID `local.manual-env-vars`, name "manual env vars set". On Pass, message is "no manual env vars are missing"; on Fail, lists every missing var in the message and stages a paste-ready `azd env set <FIRST> <value>` suggestion (sorted alphabetically — the renderer paints Suggestion as a single line, multi-line breaks indentation). When 2+ vars are missing the suggestion adds a "Repeat for each of the other variables listed above." clause; when exactly 1 is missing the bare command is the suggestion (no misleading "and others" wording). - `internal/cmd/doctor/checks_local.go` — adds the check as the 7th entry in `NewLocalChecks` (after `local.agent-yaml-valid`); introduces a lowercase `assembleState` field on the exported `Dependencies` struct as a test seam (production code leaves it nil; tests inject directly). - Skip cascade: skips when AzdClient is nil OR when `local.agent-yaml-valid` failed/skipped OR when `local.environment-selected` failed/skipped. The first guard transitively covers azure-yaml → agent-service-detected → agent-yaml-valid; the second is required because env-selection is a sibling chain (not upstream of agent-yaml-valid) and `nextstep.AssembleState` silently early-exits its detectMissingVars block when no env is selected (state.go: `if project != nil && envName != ""`). Without the env guard the check would falsely Pass in a state where no env values were ever examined. ## Architecture: why reuse `nextstep.AssembleState` The "manual vs infra" classification logic lives in nextstep — the same place that drives every other `Next:` recommendation. Adding a separate classifier inside the doctor would split the source of truth, and future improvements (e.g. C1's Bicep-output discovery replacing the `AZURE_` prefix shortcut) would have to be ported twice. Forwarding to `AssembleState` keeps the doctor as a presentation layer. ## Tests 12 new sub-tests in `checks_manual_env_test.go`: - No client → Skip - Prior `local.agent-yaml-valid` failed → Skip - Prior `local.agent-yaml-valid` skipped → Skip (cascade propagation) - Prior `local.environment-selected` failed → Skip (regression for Opus xhigh review HIGH finding; asserts assembler is NOT called via t.Fatalf in the fake) - Prior `local.environment-selected` skipped → Skip (symmetric) - 0 missing → Pass with "no manual env vars are missing" - 1 missing → Fail with bare `azd env set <NAME> <value>` suggestion (asserts no "Repeat" / "likewise" wording — regression for Sonnet 4.6 review MEDIUM finding) - 4 missing → Fail with sorted list; suggestion has "Repeat for each of the other variables" clause - nil State from assembler → Fail with assembly error message - nil State + nil errs → Fail with fallback "unknown error" message - Non-fatal errs but State populated → Pass (state.MissingManualVars is the authoritative signal; ancillary errors like missing AI_AGENT_PENDING_PROVISION key don't dirty the result) - Existing `TestNewLocalChecks_OrderAndIDs` extended from 6 to 7 checks; new `TestNewLocalChecks_IncludesManualEnvVarsLast` pins the agent-yaml-valid → manual-env-vars ordering invariant so a future reorder can't silently break the skip-cascade. ## Out of scope - Bicep-output classifier (B1 fix / C1) — `assembleState` today routes non-`AZURE_*` Bicep outputs (e.g. `TOOLBOX_*`) into MissingManualVars rather than MissingInfraVars. This check will surface that bug verbatim until C1 lands; that's the correct phasing because (a) every Phase 5 consumer benefits from the same fix at once, and (b) showing the current behavior in the doctor makes the bug debuggable in the meantime. Refs: Azure#7975 lines 117-127, 308-312 (issue spec) Refs: Azure#8057 (PR being implemented) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 5bb883e commit a85dcfd

4 files changed

Lines changed: 482 additions & 3 deletions

File tree

cli/azd/extensions/azure.ai.agents/internal/cmd/doctor/checks_local.go

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ import (
99
"strconv"
1010
"strings"
1111

12+
"azureaiagent/internal/cmd/nextstep"
13+
1214
"github.com/azure/azure-dev/cli/azd/pkg/azdext"
1315
"google.golang.org/grpc/codes"
1416
"google.golang.org/grpc/status"
@@ -39,12 +41,23 @@ type Dependencies struct {
3941
AzdClient *azdext.AzdClient
4042
AzdClientErr error
4143
ExtensionVersion string
44+
45+
// assembleState is a test seam: when non-nil it replaces the
46+
// production `nextstep.AssembleState` call inside the
47+
// `local.manual-env-vars` check, letting unit tests inject a
48+
// pre-computed State without standing up a temp project on disk.
49+
// Lowercase so external packages cannot reach it. Production code
50+
// (NewLocalChecks via the Cobra wiring) leaves it nil.
51+
assembleState func(ctx context.Context, client *azdext.AzdClient) (*nextstep.State, []error)
4252
}
4353

4454
// NewLocalChecks returns the canonical sequence of local doctor checks
45-
// in execution order. Phase 4.2 covered checks 1-3; Phase 4.3 adds
55+
// in execution order. Phase 4.2 covered checks 1-3; Phase 4.3 added
4656
// checks 4-6 (agent service detected, project endpoint set, agent.yaml
47-
// valid).
57+
// valid). Phase 5 C9 appends check 7 (manual env vars set) — local
58+
// check #9 in the design's numbered table (renumbered here because
59+
// remote checks 7-8 are gated behind --local-only until the runner
60+
// refactor lands in C10).
4861
func NewLocalChecks(deps Dependencies) []Check {
4962
return []Check{
5063
newCheckGRPCAndVersion(deps),
@@ -53,6 +66,7 @@ func NewLocalChecks(deps Dependencies) []Check {
5366
newCheckAgentServiceDetected(deps),
5467
newCheckProjectEndpointSet(deps),
5568
newCheckAgentYAMLValid(deps),
69+
newCheckManualEnvVars(deps),
5670
}
5771
}
5872

cli/azd/extensions/azure.ai.agents/internal/cmd/doctor/checks_local_test.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -443,7 +443,7 @@ func TestNewLocalChecks_OrderAndIDs(t *testing.T) {
443443
t.Parallel()
444444

445445
checks := NewLocalChecks(Dependencies{})
446-
require.Len(t, checks, 6)
446+
require.Len(t, checks, 7)
447447

448448
want := []struct {
449449
id string
@@ -456,6 +456,7 @@ func TestNewLocalChecks_OrderAndIDs(t *testing.T) {
456456
{"local.agent-service-detected", "agent service in azure.yaml", false},
457457
{"local.project-endpoint-set", "AZURE_AI_PROJECT_ENDPOINT set", false},
458458
{"local.agent-yaml-valid", "agent.yaml valid (per service)", false},
459+
{"local.manual-env-vars", "manual env vars set", false},
459460
}
460461
for i, w := range want {
461462
require.Equal(t, w.id, checks[i].ID, "index %d", i)
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
// Copyright (c) Microsoft Corporation. All rights reserved.
2+
// Licensed under the MIT License.
3+
4+
package doctor
5+
6+
import (
7+
"context"
8+
"fmt"
9+
"slices"
10+
"strings"
11+
12+
"azureaiagent/internal/cmd/nextstep"
13+
14+
"github.com/azure/azure-dev/cli/azd/pkg/azdext"
15+
)
16+
17+
// newCheckManualEnvVars produces Check `local.manual-env-vars` — the
18+
// "manual config values not set" diagnostic.
19+
//
20+
// "Manual" env vars are values referenced by `${...}` syntax inside an
21+
// agent.yaml whose names are NOT declared as outputs of the project's
22+
// infrastructure (Bicep / Terraform). They are operator-supplied:
23+
// third-party API keys, model deployment names, hand-rolled connection
24+
// strings. They have to be set in the active azd environment before
25+
// `azd ai agent run` (local) or `azd deploy` (Azure) can resolve the
26+
// agent.yaml — otherwise the running agent sees the literal `${KEY}`
27+
// string and almost certainly fails on first use.
28+
//
29+
// The classification of "manual" vs "infra" lives in nextstep's
30+
// AssembleState (the same pipeline that drives the `Next:` renderer's
31+
// per-state guidance). This check forwards the result so the doctor
32+
// surfaces the same signal users see at the end of `azd ai agent init`
33+
// — no second source of truth, no drift.
34+
//
35+
// Source-of-truth: issue Azure/azure-dev#7975 "Example output (project
36+
// ready, but manual config values missing)" lines 117-127. The doctor
37+
// reports the gap; the post-init `Next:` block (resolver.go, manual-vars
38+
// branch) tells the user what to type.
39+
//
40+
// Skip cascade — this check skips when any of the following hold:
41+
//
42+
// - deps.AzdClient is nil (gRPC channel unavailable). Check
43+
// `local.grpc-extension` will already have failed with the actionable
44+
// error.
45+
// - `local.agent-yaml-valid` failed or was skipped. A broken agent.yaml
46+
// produces an empty MissingManualVars (the classifier can't extract
47+
// references it can't parse), which would mislead the user into
48+
// thinking nothing was missing. This guard transitively covers the
49+
// azure-yaml → agent-service-detected → agent-yaml-valid arm of the
50+
// local-check chain (each step's own skip-cascade propagates here).
51+
// - `local.environment-selected` failed or was skipped.
52+
// `nextstep.AssembleState` early-exits its `detectMissingVars` block
53+
// when no env is selected (state.go: `if project != nil && envName != ""`).
54+
// Without this guard the check would silently produce a Pass
55+
// ("no manual env vars are missing") in a state where it never even
56+
// looked at any env values — the exact false-Pass the doctor exists
57+
// to prevent. `environment-selected` is a sibling chain off
58+
// `azure-yaml` (not upstream of `agent-yaml-valid`), so the previous
59+
// guard does not cover it transitively.
60+
//
61+
// On Fail the check lists every missing var in the Message (callers can
62+
// also iterate `Details["missingManualVars"]` for the structured payload).
63+
// The Suggestion picks the first missing var as a paste-ready example
64+
// rather than concatenating one `azd env set` line per var: the formatter
65+
// renders Suggestion as a single line, and a paragraph of newlines would
66+
// break the indentation. Users see the full list in the Message and one
67+
// concrete command to copy-paste.
68+
func newCheckManualEnvVars(deps Dependencies) Check {
69+
return Check{
70+
ID: "local.manual-env-vars",
71+
Name: "manual env vars set",
72+
Fn: func(ctx context.Context, _ Options, prior []Result) Result {
73+
if deps.AzdClient == nil {
74+
return Result{Status: StatusSkip, Message: "skipped: azd extension not reachable"}
75+
}
76+
if priorBlocked(prior, "local.agent-yaml-valid") {
77+
return Result{Status: StatusSkip, Message: "skipped: agent.yaml check failed or skipped"}
78+
}
79+
if priorBlocked(prior, "local.environment-selected") {
80+
// Without an azd env, AssembleState's detectMissingVars
81+
// block is skipped (state.go:258), so MissingManualVars
82+
// would be empty and the check would falsely Pass.
83+
return Result{
84+
Status: StatusSkip,
85+
Message: "skipped: no azd environment selected (cannot resolve agent.yaml variables)",
86+
}
87+
}
88+
89+
assembler := deps.assembleState
90+
if assembler == nil {
91+
assembler = func(c context.Context, client *azdext.AzdClient) (*nextstep.State, []error) {
92+
return nextstep.AssembleState(c, client)
93+
}
94+
}
95+
state, errs := assembler(ctx, deps.AzdClient)
96+
if state == nil {
97+
// AssembleState always returns a non-nil State even when errs
98+
// is non-empty — but defend against a future contract change
99+
// so this check can't be the one to panic-dereference.
100+
cause := "unknown error"
101+
if len(errs) > 0 {
102+
cause = errs[0].Error()
103+
}
104+
return Result{
105+
Status: StatusFail,
106+
Message: fmt.Sprintf("failed to assemble agent state: %s", cause),
107+
Suggestion: "Re-run `azd ai agent doctor`; the state assembly returned nil unexpectedly.",
108+
}
109+
}
110+
111+
missing := slices.Clone(state.MissingManualVars)
112+
slices.Sort(missing)
113+
114+
if len(missing) == 0 {
115+
return Result{
116+
Status: StatusPass,
117+
Message: "no manual env vars are missing",
118+
}
119+
}
120+
121+
// Single-line Suggestion: pin a paste-ready command for the
122+
// first (sorted) missing var, plus a clause pointing at the
123+
// rest only when there ARE additional entries. When exactly
124+
// one var is missing the bare command is the right
125+
// instruction — adding "and likewise for the others" implies
126+
// the user missed something they didn't.
127+
suggestion := fmt.Sprintf("Run `azd env set %s <value>`.", missing[0])
128+
if len(missing) > 1 {
129+
suggestion += " Repeat for each of the other variables listed above."
130+
}
131+
132+
return Result{
133+
Status: StatusFail,
134+
Message: fmt.Sprintf(
135+
"%d manual env var(s) referenced by agent.yaml are not set in the azd environment: %s",
136+
len(missing), strings.Join(missing, ", ")),
137+
Suggestion: suggestion,
138+
Details: map[string]any{
139+
"missingManualVars": missing,
140+
},
141+
}
142+
},
143+
}
144+
}

0 commit comments

Comments
 (0)