Skip to content

Commit e73a001

Browse files
Antriksh JainCopilot
andcommitted
fix(azure.ai.agents): doctor next-step accuracy + comment correctness (review fix-ups)
Three fix-ups from the 3-reviewer consensus pass on commit 48de7e8 (range 4617dd9..48de7e8). All three reached 3/3 CONFIRM during cross-pollination. G1 (medium): `azd ai agent doctor` trailing block was ambiguous in mixed deployed/undeployed projects. resolveDoctorTrailing filtered state.Services to deployed-only BEFORE calling ResolveAfterDeploy. The resolver then computed `singleAgent := len(state.Services) == 1` against the filtered slice. In a project with 2 agent services where only 1 is deployed, the filtered slice has length 1 → singleAgent=true → resolver emits no-arg `azd ai agent show` and `azd ai agent invoke '<payload>'`. But at runtime, both `show` and `invoke` call resolveAgentService against the full azure.yaml (helpers.go:577-596), which still sees both services and either prompts interactively (TTY) or errors with "multiple azure.ai.agent services found" (--no-prompt). The doctor's copy-pasteable suggestion is therefore ambiguous — a foot-gun for the user. Fix: add a variadic AfterDeployOpts{ForceQualified bool} parameter to ResolveAfterDeploy. Doctor passes ForceQualified=true whenever the pre-filter azure.yaml has multiple agent services (totalServices > 1), regardless of how many are deployed. The resolver then emits service-qualified `show <name>` / `invoke <name> ...` commands unconditionally in that scenario. The existing post-deploy callsite (service_target_agent.go) and all 8 existing resolver tests continue to work with the unchanged 3-arg form because the new parameter is variadic. S1 (medium): Misleading comment at doctor.go:106-107 falsely claimed that `os.Exit(code)` runs the deferred logCleanup + azdClient.Close ("Defers above run via the explicit Close + flushed stdout writer"). Per Go semantics, os.Exit terminates immediately and deferred functions do not run. The practical impact today is zero (the OS reclaims the gRPC socket and log fd; neither defer has on-disk state to flush), but a future contributor reading this comment would be silently misled when adding cleanup-critical defers (e.g., flushing telemetry, releasing a lock, closing a temp file). Fix: rewrite the comment to acknowledge defers don't run, document why it's safe today, and warn against adding cleanup-critical defers without an explicit pre-os.Exit call. G2 (low): doctorCachedPayload looked up the cached OpenAPI spec keyed by serviceName (azure.yaml service name). But remote invoke rewrites `name` to info.AgentName (the deployed Foundry name from AGENT_<KEY>_NAME) BEFORE caching (invoke.go:694-758). When deploy appends a suffix (the divergence documented at show.go:40-46), the two strings differ and the cache lookup misses, causing the doctor's trailing block to fall back to the protocol-generic literal payload. Fix: doctorCachedPayload now first tries the deployed agent name resolved from AGENT_<SERVICE>_NAME, then falls back to the service name when the env var is absent or matches the service name (no divergence). Mirrors the pattern already used by show.go and avoids a silent cache miss in the suffix-appending case. Rejected: G3 (GPT-5.5: extend README casing check to include {readme.md, README.MD}). Sonnet and Opus both rejected during cross- pollination with the same rationale: the resolver hardcodes "README.md" in the rendered hint (`see <relPath>/README.md`), so emitting the hint when only a non-canonical casing exists on disk would create a broken pointer on case-sensitive filesystems. Suppressing the hint is the correct defensive behavior. Tests: - nextstep/resolver_test.go: 6 new subtests covering ForceQualified behavior (len==1+force, len==1+no-force, multi-agent force=no-op, cached payload composition, variadic-opts behavior). - Existing 8 ResolveAfterDeploy subtests unchanged — backward compat verified. Pre-flight: - gofmt -s: clean - go vet ./...: clean - go build ./...: clean - go test ./internal/cmd/...: pass (cmd 16.4s, doctor 6.7s, nextstep 6.2s) - golangci-lint run ./...: 0 issues - cspell: 0 issues Smoke (single-service `hello-world-python-invocations`): - `azd ai agent doctor`: 6 PASS, exit 0 - `azd ai agent doctor --output json`: valid envelope, schemaVersion 1.0 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 3760172 commit e73a001

3 files changed

Lines changed: 151 additions & 4 deletions

File tree

cli/azd/extensions/azure.ai.agents/internal/cmd/doctor.go

Lines changed: 65 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,15 @@ Exit codes:
103103
// and any non-nil return to exit 1, which collapses our
104104
// three-state contract into a two-state one. We call
105105
// os.Exit directly to preserve the 0/1/2 distinction.
106-
// Defers above run via the explicit Close + flushed
107-
// stdout writer; nothing else needs cleanup before exit.
106+
//
107+
// os.Exit does NOT run deferred functions. The deferred
108+
// logCleanup and azdClient.Close above will not execute on
109+
// the non-zero path. This is acceptable today because the
110+
// process exits immediately and the OS reclaims the gRPC
111+
// socket and (in --debug mode) the log fd; neither defer
112+
// has on-disk state to flush. Do NOT add cleanup-critical
113+
// defers to this RunE — call them explicitly before
114+
// os.Exit instead.
108115
code := doctor.ExitCode(report)
109116
if code == 0 {
110117
return nil
@@ -211,11 +218,25 @@ func resolveDoctorTrailing(ctx context.Context, azdClient *azdext.AzdClient) []n
211218
}
212219

213220
if anyServiceDeployed(state.Services) {
221+
// Capture the total agent-service count BEFORE filtering. The
222+
// resolver's `len(state.Services) == 1` heuristic ordinarily
223+
// keys "should I emit no-arg show/invoke commands?" off the
224+
// total count of agent services in azure.yaml. Once we filter
225+
// to deployed-only, that heuristic breaks: a 2-service project
226+
// with 1 deployed would emit `azd ai agent show` (no name),
227+
// but runtime `resolveAgentService` still sees both services
228+
// in azure.yaml and would either prompt or error. Forcing
229+
// qualified suggestions whenever azure.yaml has multiple
230+
// services preserves copy-paste correctness in the partial-
231+
// deploy case and is a no-op when all services are deployed
232+
// (the resolver naturally qualifies len > 1 anyway).
233+
totalServices := len(state.Services)
214234
filtered := filterDeployedServices(state)
215235
return nextstep.ResolveAfterDeploy(
216236
filtered,
217237
doctorCachedPayload(ctx, azdClient),
218238
doctorReadmeExists(ctx, azdClient),
239+
nextstep.AfterDeployOpts{ForceQualified: totalServices > 1},
219240
)
220241
}
221242

@@ -260,7 +281,28 @@ func filterDeployedServices(state *nextstep.State) *nextstep.State {
260281
// for the deployed agent (`azd ai agent invoke <agent>`); the local
261282
// cache (suffix "local") is from `azd ai agent invoke --local` and is
262283
// not appropriate here.
284+
//
285+
// Key resolution: the on-disk cache is keyed by the deployed Foundry
286+
// agent name (see invoke.go:694-758 — invoke rewrites `name` to
287+
// `info.AgentName` BEFORE caching). That can differ from the azure.yaml
288+
// service name when deploy appends a suffix (documented in
289+
// show.go:40-46). The closure first tries the deployed name via the
290+
// `AGENT_<SERVICE>_NAME` env var, then falls back to the service name
291+
// when the env value is absent (e.g., never-deployed service, or older
292+
// deploys that did not populate the var). The fallback also covers the
293+
// non-divergent case where the two names are identical.
263294
func doctorCachedPayload(ctx context.Context, azdClient *azdext.AzdClient) func(string) string {
295+
// Resolve the active env name once for the closure's lifetime.
296+
// A nil/error response leaves envName empty, which short-circuits
297+
// the deployed-name lookup path inside the closure.
298+
var envName string
299+
if azdClient != nil {
300+
if envResp, err := azdClient.Environment().GetCurrent(ctx, &azdext.EmptyRequest{}); err == nil &&
301+
envResp != nil && envResp.Environment != nil {
302+
envName = envResp.Environment.Name
303+
}
304+
}
305+
264306
return func(serviceName string) string {
265307
if azdClient == nil || serviceName == "" {
266308
return ""
@@ -269,7 +311,27 @@ func doctorCachedPayload(ctx context.Context, azdClient *azdext.AzdClient) func(
269311
if err != nil {
270312
return ""
271313
}
272-
spec, err := nextstep.ReadCachedOpenAPISpec(filepath.Dir(configPath), serviceName, "remote")
314+
configDir := filepath.Dir(configPath)
315+
316+
// Try the deployed agent name first.
317+
if envName != "" {
318+
nameKey := fmt.Sprintf("AGENT_%s_NAME", toServiceKey(serviceName))
319+
if v, err := azdClient.Environment().GetValue(ctx, &azdext.GetEnvRequest{
320+
EnvName: envName,
321+
Key: nameKey,
322+
}); err == nil && v != nil && v.Value != "" && v.Value != serviceName {
323+
if spec, err := nextstep.ReadCachedOpenAPISpec(configDir, v.Value, "remote"); err == nil {
324+
if payload := nextstep.ExtractInvokeExample(spec); payload != "" {
325+
return payload
326+
}
327+
}
328+
}
329+
}
330+
331+
// Fall back to service-name keyed cache for the non-divergent
332+
// case (and for projects whose AGENT_<SERVICE>_NAME var is
333+
// absent for any reason).
334+
spec, err := nextstep.ReadCachedOpenAPISpec(configDir, serviceName, "remote")
273335
if err != nil {
274336
return ""
275337
}

cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/resolver.go

Lines changed: 26 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -288,6 +288,22 @@ func ResolveAfterShow(state *State, serviceName string) []Suggestion {
288288
}}
289289
}
290290

291+
// AfterDeployOpts configures ResolveAfterDeploy. Optional — the
292+
// zero-value matches the historical post-deploy call site behavior.
293+
type AfterDeployOpts struct {
294+
// ForceQualified, when true, makes ResolveAfterDeploy emit
295+
// service-qualified `azd ai agent show <name>` / `invoke <name> ...`
296+
// commands even when len(state.Services) == 1.
297+
//
298+
// Use this when the input State has been filtered down from a
299+
// larger multi-agent project (e.g., doctor showing only deployed
300+
// services). The default `len(state.Services) == 1` heuristic
301+
// would otherwise emit no-arg commands that ambiguity-prompt or
302+
// error at runtime because resolveAgentService sees ALL azure.yaml
303+
// services, not just the filtered subset.
304+
ForceQualified bool
305+
}
306+
291307
// ResolveAfterDeploy produces the Next: block embedded in the post-deploy
292308
// artifact note. The block is rendered per agent service: one
293309
// `azd ai agent show <name>` plus one `azd ai agent invoke '<payload>'`
@@ -301,17 +317,26 @@ func ResolveAfterShow(state *State, serviceName string) []Suggestion {
301317
// readmeExists, also injected, controls whether the "See <relPath>/README.md
302318
// for a sample payload" line is appended. The resolver does not touch the
303319
// filesystem directly.
320+
//
321+
// opts is variadic for backward compatibility. Only the first element is
322+
// consulted; additional elements are ignored.
304323
func ResolveAfterDeploy(
305324
state *State,
306325
cachedPayload func(serviceName string) string,
307326
readmeExists func(relativePath string) bool,
327+
opts ...AfterDeployOpts,
308328
) []Suggestion {
309329
if state == nil || len(state.Services) == 0 {
310330
return nil
311331
}
312332

333+
var forceQualified bool
334+
if len(opts) > 0 {
335+
forceQualified = opts[0].ForceQualified
336+
}
337+
313338
out := make([]Suggestion, 0, len(state.Services)*3)
314-
singleAgent := len(state.Services) == 1
339+
singleAgent := !forceQualified && len(state.Services) == 1
315340
priority := 10
316341

317342
for _, svc := range state.Services {

cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/resolver_test.go

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -479,6 +479,66 @@ func TestResolveAfterDeploy(t *testing.T) {
479479
require.Len(t, out, 2)
480480
assert.Equal(t, `azd ai agent invoke '{"q":"don'\''t"}'`, out[1].Command)
481481
})
482+
483+
t.Run("ForceQualified=true on len==1 → service-qualified commands", func(t *testing.T) {
484+
t.Parallel()
485+
state := &State{Services: []ServiceState{
486+
{Name: "echo", RelativePath: "./src/echo", Protocol: ProtocolInvocations},
487+
}}
488+
out := ResolveAfterDeploy(state, nil, nil, AfterDeployOpts{ForceQualified: true})
489+
require.Len(t, out, 2)
490+
assert.Equal(t, "azd ai agent show echo", out[0].Command)
491+
assert.Equal(t, `azd ai agent invoke echo '{"message": "Hello!"}'`, out[1].Command)
492+
})
493+
494+
t.Run("ForceQualified=false on len==1 → unqualified (matches default)", func(t *testing.T) {
495+
t.Parallel()
496+
state := &State{Services: []ServiceState{
497+
{Name: "echo", RelativePath: "./src/echo", Protocol: ProtocolInvocations},
498+
}}
499+
out := ResolveAfterDeploy(state, nil, nil, AfterDeployOpts{ForceQualified: false})
500+
require.Len(t, out, 2)
501+
assert.Equal(t, "azd ai agent show", out[0].Command)
502+
assert.Equal(t, `azd ai agent invoke '{"message": "Hello!"}'`, out[1].Command)
503+
})
504+
505+
t.Run("ForceQualified=true with cached payload → qualified invoke uses payload", func(t *testing.T) {
506+
t.Parallel()
507+
state := &State{Services: []ServiceState{{Name: "echo", RelativePath: "./src/echo"}}}
508+
cached := func(_ string) string { return `{"q":"x"}` }
509+
out := ResolveAfterDeploy(state, cached, nil, AfterDeployOpts{ForceQualified: true})
510+
require.Len(t, out, 2)
511+
assert.Equal(t, "azd ai agent show echo", out[0].Command)
512+
assert.Equal(t, `azd ai agent invoke echo '{"q":"x"}'`, out[1].Command)
513+
})
514+
515+
t.Run("ForceQualified=true on multi-agent → qualified (already-qualified case unaffected)", func(t *testing.T) {
516+
t.Parallel()
517+
state := &State{Services: []ServiceState{
518+
{Name: "alpha", Protocol: ProtocolInvocations},
519+
{Name: "beta", Protocol: ProtocolResponses},
520+
}}
521+
out := ResolveAfterDeploy(state, nil, nil, AfterDeployOpts{ForceQualified: true})
522+
require.Len(t, out, 4)
523+
assert.Equal(t, "azd ai agent show alpha", out[0].Command)
524+
assert.Equal(t, `azd ai agent invoke alpha '{"message": "Hello!"}'`, out[1].Command)
525+
assert.Equal(t, "azd ai agent show beta", out[2].Command)
526+
assert.Equal(t, `azd ai agent invoke beta "Hello!"`, out[3].Command)
527+
})
528+
529+
t.Run("extra opts elements beyond [0] are ignored", func(t *testing.T) {
530+
t.Parallel()
531+
state := &State{Services: []ServiceState{
532+
{Name: "echo", RelativePath: "./src/echo", Protocol: ProtocolInvocations},
533+
}}
534+
out := ResolveAfterDeploy(
535+
state, nil, nil,
536+
AfterDeployOpts{ForceQualified: true},
537+
AfterDeployOpts{ForceQualified: false}, // should be ignored
538+
)
539+
require.Len(t, out, 2)
540+
assert.Equal(t, "azd ai agent show echo", out[0].Command)
541+
})
482542
}
483543

484544
func TestFindService(t *testing.T) {

0 commit comments

Comments
 (0)