Skip to content

Commit 493e5fa

Browse files
Antriksh JainCopilot
andcommitted
fix(azure.ai.agents): doctor local checks — 3-of-3 review fix-ups (transport-error suggestions, unparseable-version message, version-agnostic Suggestion)
Three fix-ups from the 3-reviewer pass on ba365ab. All three are 3/3 consensus on the mechanic; the implementation chosen for each is the reviewer-preferred shape (and minimally invasive). G1 — transport-aware suggestion in checks 2 & 3 (was: GPT/Sonnet "blocker", Opus "do not block, fix in checks 2/3 instead of with a probe") `azdext.NewAzdClient` constructs a *lazy* gRPC channel via `grpc.NewClient`, so `deps.AzdClient != nil` cannot detect a stale/unreachable `AZD_SERVER`. The transport failure first surfaces in the next RPC — `Project().Get` in check 2, or `Environment().GetCurrent` in check 3 — where the existing suggestions ("Run from a directory containing azure.yaml, or `azd init`" / "Create one with `azd env new`") are then actively wrong: they tell the user to fix project / env state when the actual root cause is a broken channel. Fix: a new `isTransportFailure(err)` helper inspects the gRPC status and returns true for `codes.Unavailable` and `codes.DeadlineExceeded`. Checks 2 and 3 swap the suggestion to "Re-run via `azd ai agent doctor`; the extension cannot reach azd's gRPC channel." for transport-class errors only. Server-side errors (`codes.NotFound`, `codes.Internal`, etc.) keep the domain-specific suggestion. `codes.Canceled` is user-initiated, not a transport failure. Rejected: adding a dedicated probe RPC in check 1. Opus pointed out that the probe adds latency to every doctor invocation, while the fix in checks 2/3 achieves the same UX with zero extra RPCs. S1 — distinguish "above floor" from "couldn't verify floor" (was: Sonnet "blocker → Warn", GPT "Low → Warn", Opus "Low → Pass with distinguished message") When the extension version string is non-empty and non-"dev" but still unparseable (e.g. "canary", "preview-beta-1", "1.2"), the previous code fell through to the floor-pass branch with message `"azd extension reachable (version canary)."` — indistinguishable from a genuinely- above-floor pass. Fix: call `parseMainVersion` explicitly in check 1 before the floor compare. On parse failure, return `StatusPass` with a distinguished message ("floor check skipped: version string not parseable") and `Details["floorChecked"] = false`. Preserves the fail-open philosophy (no nagging Warn on a build-label drift) while killing the false-green: JSON consumers can detect the inconclusive case via the Details bit. Rejected: Sonnet's StatusWarn proposal. Build labels that don't match strict semver are not user errors — surfacing them as Warnings would nag a long tail of legitimate non-standard build strings. O1 — drop hard-coded "1.24.0" from the nil-client Suggestion (was: Opus "Low", Sonnet "confirm Low", GPT "confirm Low") The old Suggestion told users to "ensure azd is at least 1.24.0", but the extension declares its actual floor in extension.yaml (`requiredAzdVersion: ">1.23.13"`) and `go.mod` pins azd v1.23.14. A user on 1.23.14 would have been told to perform an unneeded upgrade. Fix: drop the version claim entirely. The Suggestion is now version-agnostic ("Run the extension via `azd ai agent doctor` rather than launching the extension binary directly.") so it cannot drift from the extension's declared floor again. Test pins the version-agnostic contract via `require.NotContains(t, got.Suggestion, "1.24.0")`. Rejected (S2 — hardcoded `"local.azure-yaml"` ID in cascade-skip): 1/3 votes. GPT and Opus both rejected it as a maintainability preference, not a defect; deferred indefinitely. Net: +3 tests (transport-error swap in check 2 with 2 subcases, transport-error swap in check 3, unparseable-version pass with 3 ver subcases) + 1 helper test (`TestIsTransportFailure` with 7 subcases, including the Canceled-is-not-transport boundary case). Existing `TestCheckGRPCAndVersion_NoClient_Fails` pins the version-agnostic contract for O1. Pre-flight clean: gofmt, vet, build, doctor 8.8s (was 5.5s), full extension suite green (cmd 16.0s, doctor 5.2s, nextstep 6.2s, others unchanged), golangci-lint 0 issues, cspell 0 issues. Per workflow precedent (2.4.1, 2.5.1, 2.6.5, 4.1.1, 4.1.2), trivial 3/3-consensus fix-ups skip a second review pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 2c08fc7 commit 493e5fa

2 files changed

Lines changed: 173 additions & 3 deletions

File tree

cli/azd/extensions/azure.ai.agents/internal/cmd/doctor/checks_local.go

Lines changed: 55 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ import (
1010
"strings"
1111

1212
"github.com/azure/azure-dev/cli/azd/pkg/azdext"
13+
"google.golang.org/grpc/codes"
14+
"google.golang.org/grpc/status"
1315
)
1416

1517
// MinNewBackendVersion is the floor extension version required to talk to
@@ -73,7 +75,7 @@ func newCheckGRPCAndVersion(deps Dependencies) Check {
7375
return Result{
7476
Status: StatusFail,
7577
Message: msg,
76-
Suggestion: "Run the extension via `azd ai agent doctor` (not the extension binary directly) and ensure azd is at least 1.24.0.",
78+
Suggestion: "Run the extension via `azd ai agent doctor` rather than launching the extension binary directly.",
7779
}
7880
}
7981

@@ -85,6 +87,23 @@ func newCheckGRPCAndVersion(deps Dependencies) Check {
8587
}
8688
}
8789

90+
// If the version string is non-empty/non-"dev" but still can't be parsed
91+
// (e.g. a build label like "canary" or an unexpected future format),
92+
// surface Pass but mark the floor check as skipped rather than silently
93+
// claiming the floor was verified.
94+
if _, ok := parseMainVersion(ver); !ok {
95+
return Result{
96+
Status: StatusPass,
97+
Message: fmt.Sprintf(
98+
"azd extension reachable (version %s; floor check skipped: version string not parseable).",
99+
ver),
100+
Details: map[string]any{
101+
"extensionVersion": ver,
102+
"floorChecked": false,
103+
},
104+
}
105+
}
106+
88107
if compareVersions(ver, MinNewBackendVersion) < 0 {
89108
return Result{
90109
Status: StatusWarn,
@@ -132,10 +151,19 @@ func newCheckProjectConfig(deps Dependencies) Check {
132151

133152
resp, err := deps.AzdClient.Project().Get(ctx, &azdext.EmptyRequest{})
134153
if err != nil {
154+
suggestion := "Run from a directory containing `azure.yaml`, or initialize one with `azd init`."
155+
if isTransportFailure(err) {
156+
// `azdext.NewAzdClient` constructs a lazy gRPC channel, so the
157+
// nil-client check above cannot detect a stale/unreachable
158+
// `AZD_SERVER` endpoint. The transport failure surfaces here on
159+
// the first RPC — swap the suggestion so the user looks at the
160+
// channel, not at `azure.yaml`.
161+
suggestion = "Re-run via `azd ai agent doctor`; the extension cannot reach azd's gRPC channel."
162+
}
135163
return Result{
136164
Status: StatusFail,
137165
Message: fmt.Sprintf("failed to get project config: %v", err),
138-
Suggestion: "Run from a directory containing `azure.yaml`, or initialize one with `azd init`.",
166+
Suggestion: suggestion,
139167
}
140168
}
141169
if resp == nil || resp.Project == nil {
@@ -188,10 +216,14 @@ func newCheckEnvironmentSelected(deps Dependencies) Check {
188216

189217
resp, err := deps.AzdClient.Environment().GetCurrent(ctx, &azdext.EmptyRequest{})
190218
if err != nil {
219+
suggestion := "Create one with `azd env new <name>` or select an existing one with `azd env select <name>`."
220+
if isTransportFailure(err) {
221+
suggestion = "Re-run via `azd ai agent doctor`; the extension cannot reach azd's gRPC channel."
222+
}
191223
return Result{
192224
Status: StatusFail,
193225
Message: fmt.Sprintf("failed to get current environment: %v", err),
194-
Suggestion: "Create one with `azd env new <name>` or select an existing one with `azd env select <name>`.",
226+
Suggestion: suggestion,
195227
}
196228
}
197229
if resp == nil || resp.Environment == nil || resp.Environment.Name == "" {
@@ -213,6 +245,26 @@ func newCheckEnvironmentSelected(deps Dependencies) Check {
213245
}
214246
}
215247

248+
// isTransportFailure reports whether err is a gRPC transport-class failure
249+
// (channel unreachable, deadline exceeded) as opposed to a server-side
250+
// application error. Used by downstream checks to swap the user-facing
251+
// suggestion when an RPC fails because the channel itself is broken,
252+
// rather than because the project/environment is misconfigured.
253+
func isTransportFailure(err error) bool {
254+
if err == nil {
255+
return false
256+
}
257+
st, ok := status.FromError(err)
258+
if !ok {
259+
return false
260+
}
261+
switch st.Code() {
262+
case codes.Unavailable, codes.DeadlineExceeded:
263+
return true
264+
}
265+
return false
266+
}
267+
216268
// coalesce returns the first non-empty string in values, or "" if all
217269
// are empty. Used to keep the version-floor check's Pass message
218270
// readable when the version string is blank.

cli/azd/extensions/azure.ai.agents/internal/cmd/doctor/checks_local_test.go

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,9 @@ func TestCheckGRPCAndVersion_NoClient_Fails(t *testing.T) {
107107
require.Contains(t, got.Message, "gRPC channel to azd unavailable")
108108
require.Contains(t, got.Message, "connection refused")
109109
require.NotEmpty(t, got.Suggestion)
110+
// Suggestion should be version-agnostic — the extension declares its
111+
// own required azd floor in extension.yaml; doctor must not duplicate it.
112+
require.NotContains(t, got.Suggestion, "1.24.0")
110113
}
111114

112115
func TestCheckGRPCAndVersion_NoClient_NilErr_StillFails(t *testing.T) {
@@ -184,6 +187,34 @@ func TestCheckGRPCAndVersion_AboveFloor_Passes(t *testing.T) {
184187
require.Equal(t, StatusPass, got.Status)
185188
}
186189

190+
// TestCheckGRPCAndVersion_UnparseableVersion_PassesButFlagsFloorSkipped
191+
// pins the contract for non-empty/non-"dev" version strings that fail to
192+
// parse (e.g. "canary", "preview-beta-1"). The check must still Pass — the
193+
// gRPC channel is healthy — but the message must distinguish "above floor"
194+
// from "couldn't verify floor", and Details["floorChecked"] must be false
195+
// so downstream consumers (e.g. JSON output) can tell the difference.
196+
func TestCheckGRPCAndVersion_UnparseableVersion_PassesButFlagsFloorSkipped(t *testing.T) {
197+
t.Parallel()
198+
199+
client := newTestAzdClient(t, &fakeProjectServer{}, &fakeEnvironmentServer{})
200+
201+
for _, ver := range []string{"canary", "preview-beta-1", "1.2"} {
202+
check := newCheckGRPCAndVersion(Dependencies{
203+
AzdClient: client,
204+
ExtensionVersion: ver,
205+
})
206+
got := check.Fn(t.Context(), Options{}, nil)
207+
208+
require.Equalf(t, StatusPass, got.Status, "ver=%q", ver)
209+
require.Containsf(t, got.Message, ver, "ver=%q: message should echo the version", ver)
210+
require.Containsf(t, got.Message, "floor check skipped", "ver=%q", ver)
211+
require.NotContainsf(t, got.Message, "older than", "ver=%q must not claim below-floor", ver)
212+
require.Emptyf(t, got.Suggestion, "ver=%q: unparseable version should not nag", ver)
213+
require.Equalf(t, false, got.Details["floorChecked"], "ver=%q", ver)
214+
require.Equalf(t, ver, got.Details["extensionVersion"], "ver=%q", ver)
215+
}
216+
}
217+
187218
// ---- Check `local.azure-yaml` ----
188219

189220
func TestCheckProjectConfig_NoClient_Skips(t *testing.T) {
@@ -246,6 +277,42 @@ func TestCheckProjectConfig_Pass(t *testing.T) {
246277
require.Equal(t, "my-agent", got.Details["projectName"])
247278
}
248279

280+
// TestCheckProjectConfig_TransportError_SwapsSuggestion locks the
281+
// transport-aware suggestion swap. `azdext.NewAzdClient` constructs the
282+
// gRPC channel lazily, so a non-nil client can still fail on the first
283+
// RPC if AZD_SERVER is stale or unreachable. When the resulting error
284+
// carries a transport-class gRPC code, the suggestion must point the
285+
// user at the channel rather than at `azure.yaml`.
286+
func TestCheckProjectConfig_TransportError_SwapsSuggestion(t *testing.T) {
287+
t.Parallel()
288+
289+
cases := []struct {
290+
name string
291+
code codes.Code
292+
}{
293+
{"Unavailable", codes.Unavailable},
294+
{"DeadlineExceeded", codes.DeadlineExceeded},
295+
}
296+
for _, tc := range cases {
297+
t.Run(tc.name, func(t *testing.T) {
298+
t.Parallel()
299+
client := newTestAzdClient(t,
300+
&fakeProjectServer{err: status.Error(tc.code, "transport boom")},
301+
&fakeEnvironmentServer{},
302+
)
303+
check := newCheckProjectConfig(Dependencies{AzdClient: client})
304+
got := check.Fn(t.Context(), Options{}, nil)
305+
306+
require.Equal(t, StatusFail, got.Status)
307+
require.Contains(t, got.Message, "failed to get project config")
308+
require.Contains(t, got.Suggestion, "azd ai agent doctor")
309+
require.Contains(t, got.Suggestion, "gRPC channel")
310+
// And explicitly *not* the misleading "azd init" path.
311+
require.NotContains(t, got.Suggestion, "azd init")
312+
})
313+
}
314+
}
315+
249316
// ---- Check `local.environment-selected` ----
250317

251318
func TestCheckEnvironmentSelected_NoClient_Skips(t *testing.T) {
@@ -336,6 +403,27 @@ func TestCheckEnvironmentSelected_Pass(t *testing.T) {
336403
require.Equal(t, "staging", got.Details["environmentName"])
337404
}
338405

406+
// TestCheckEnvironmentSelected_TransportError_SwapsSuggestion is the
407+
// `local.environment-selected` sibling of the project-config transport
408+
// test. Same rationale: a transport-class gRPC code means the channel is
409+
// the root cause, not the absence of an environment.
410+
func TestCheckEnvironmentSelected_TransportError_SwapsSuggestion(t *testing.T) {
411+
t.Parallel()
412+
413+
client := newTestAzdClient(t,
414+
&fakeProjectServer{},
415+
&fakeEnvironmentServer{err: status.Error(codes.Unavailable, "transport boom")},
416+
)
417+
check := newCheckEnvironmentSelected(Dependencies{AzdClient: client})
418+
got := check.Fn(t.Context(), Options{}, nil)
419+
420+
require.Equal(t, StatusFail, got.Status)
421+
require.Contains(t, got.Message, "failed to get current environment")
422+
require.Contains(t, got.Suggestion, "azd ai agent doctor")
423+
require.Contains(t, got.Suggestion, "gRPC channel")
424+
require.NotContains(t, got.Suggestion, "azd env new")
425+
}
426+
339427
// ---- NewLocalChecks ordering / IDs ----
340428

341429
func TestNewLocalChecks_OrderAndIDs(t *testing.T) {
@@ -424,3 +512,33 @@ func TestCoalesce(t *testing.T) {
424512
require.Equal(t, "", coalesce("", "", ""))
425513
require.Equal(t, "", coalesce())
426514
}
515+
516+
// ---- transport-failure helper ----
517+
518+
func TestIsTransportFailure(t *testing.T) {
519+
t.Parallel()
520+
521+
cases := []struct {
522+
name string
523+
err error
524+
want bool
525+
}{
526+
{"nil error", nil, false},
527+
{"plain error (not a status)", errors.New("boom"), false},
528+
{"Unavailable", status.Error(codes.Unavailable, "x"), true},
529+
{"DeadlineExceeded", status.Error(codes.DeadlineExceeded, "x"), true},
530+
// Server-side errors must NOT swap the suggestion: the project / env
531+
// check then reports the real domain failure with its own wording.
532+
{"NotFound", status.Error(codes.NotFound, "x"), false},
533+
{"Internal", status.Error(codes.Internal, "x"), false},
534+
{"InvalidArgument", status.Error(codes.InvalidArgument, "x"), false},
535+
// Canceled is user-initiated, not a transport issue.
536+
{"Canceled", status.Error(codes.Canceled, "x"), false},
537+
}
538+
for _, tc := range cases {
539+
t.Run(tc.name, func(t *testing.T) {
540+
t.Parallel()
541+
require.Equal(t, tc.want, isTransportFailure(tc.err))
542+
})
543+
}
544+
}

0 commit comments

Comments
 (0)