Skip to content

Commit ec6aa9e

Browse files
Antriksh JainCopilot
andcommitted
feat(azure.ai.agents): add doctor check remote.foundry-endpoint (P5.1 C12)
Implements Check 8 from the doctor remote-checks design as the second populated entry in the remote chain. The check proves that the configured Foundry project endpoint is reachable AND that the bearer token minted by C11 actually authorizes against it — a combined "can we talk to the project at all" signal that gates every other remote check (models / agent status / RBAC) from the design's dependency matrix. The probe issues a single `GET <endpoint>/agents?api-version=<v>&limit=1` with a 10s timeout, no retries, using the same credential + scope as the production runtime path (`agent_context.go:newAgentCredential` + `agent_api/operations.go`). The `limit=1` parameter matches the production agent_api client exactly so a Pass here proves the same query shape the runtime invoke flow uses (the earlier `$top=1` choice was a divergence flagged by reviewers). The status-code response is mapped 1:1 to user-actionable outcomes: - 200 → Pass: "endpoint reachable (HTTP 200)" - 401 → Fail: "token expired or scope mismatch" + suggest `azd auth login` (link: auth login docs) - 403 → Fail: "wrong tenant or insufficient RBAC" + suggest re-auth in correct tenant and let `remote.rbac` flag the role-assignment gap (deliberately NO `azd auth login` here, but DO carry a docs Link to the Foundry RBAC quickstart so the suggestion is actionable — matches the C11 "every actionable Fail carries Links" convention) - 404 → Fail: "endpoint is wrong or project is gone" + suggest `azd provision` / `azd env set AZURE_AI_PROJECT_ENDPOINT` - 5xx → Fail: "service-side error" + suggest retry - other → Fail: "unexpected HTTP <N>" + verbose hint - transport err → Fail: "could not reach <host>: <first line>" + network / VPN / firewall guidance - ctx canceled → Skip (user aborted) - 10s elapsed → Fail: "did not respond within 10s" + retry hint Skip-cascade: `local.project-endpoint-set` AND `remote.auth`. The former gives us the endpoint to probe; the latter gives us the token to authenticate with. Skipping when either failed prevents double-reporting the same root cause. `local.environment-selected` is transitively cascaded via `local.project-endpoint-set`. Implementation notes: - The api-version is read from the package-level `DefaultAgentAPIVersion` constant by the Cobra wiring in `cmd/doctor.go` and passed in through `Dependencies.AgentAPIVersion`. This honors the design's "single source of truth" requirement while keeping the doctor package self-contained (no import cycle against `internal/cmd`). - `makeRealProbeFoundryEndpoint(apiVersion string) func(...)` is a closure factory rather than a top-level function so the api-version is captured at construction time without becoming a global. - `buildFoundryProbeURL` parses the user-supplied endpoint FIRST, then mutates `u.Path` (trim-right + "/agents"), clears `u.Fragment` / `u.RawFragment`, and only then sets RawQuery. This prevents a stray `?api-version=evil` or `#fragment` in the endpoint from displacing the `/agents` path segment — a silent-misdiagnosis bug the prior `endpoint + "/agents"` string concatenation could trigger. Regression tests now positively assert `/agents?` is present in every successful build output. The builder also returns an error for any endpoint that is not an absolute HTTPS URL with a non-empty host, so a malformed env value cannot leak a bearer token to the wrong scheme/host. - A new `validateFoundryEndpoint` helper runs at the check-level BEFORE the probe is invoked, BEFORE any token is acquired. A non-HTTPS, relative, or otherwise-malformed `AZURE_AI_PROJECT_ENDPOINT` surfaces a precise Fail with an `azd env set AZURE_AI_PROJECT_ENDPOINT <https://...>` suggestion instead of either a generic transport error (with the token leak that would have come with it) or the builder's defensive error wrapped in a less-helpful network-VPN-firewall message. - Cancellation classification mirrors C11's pattern: `errors.Is(ctx.Err(), context.Canceled)` → StatusSkip (user aborted); `errors.Is(probeCtx.Err(), context.DeadlineExceeded)` → StatusFail (we hit our own 10s bound, not the parent ctx). - Multi-line transport errors are reduced to their first line via the shared `firstLine` helper from C11 so the resulting Message stays readable. - The `Details` map carries the endpoint, request URL, and HTTP status code (when available) for `--output json` consumers and `--unredacted` debugging. No raw tokens, no response body excerpts. - 24 tests cover skip-cascade (env-not-selected, endpoint-not-set, auth-failed, AgentAPIVersion-missing), every status-code branch, cancellation vs timeout disambiguation, URL builder safety (junk query in endpoint, trailing slashes, fragment, blank api-version, non-HTTPS / relative / malformed endpoint), endpoint validation (HTTPS-only, non-empty host, well-formed), helper functions (`endpointHost`, `readProjectEndpoint`, `firstLine` reuse), a TLS httptest server smoke test asserting the built URL lands on `/agents` on the wire, and a token-leak sanity check on Details / Message / Suggestion strings. Behavior: with this commit, `azd ai agent doctor` now produces two remote checks (`remote.auth` from C11 + `remote.foundry-endpoint` from C12) instead of just one. The full remote chain still requires C13+ to be useful end-to-end, but every subsequent check can now take a Pass on this one as proof that the project URL works. Refs: Azure#7975, PR Azure#8057 design-spec Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 10f40f8 commit ec6aa9e

6 files changed

Lines changed: 1181 additions & 15 deletions

File tree

cli/azd/extensions/azure.ai.agents/internal/cmd/doctor.go

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ Exit codes:
8888
AzdClient: azdClient,
8989
AzdClientErr: clientErr,
9090
ExtensionVersion: version.Version,
91+
AgentAPIVersion: DefaultAgentAPIVersion,
9192
}
9293

9394
opts := doctor.Options{

0 commit comments

Comments
 (0)