Commit d478803
Azure AI Agents: add next-step guidance and doctor diagnostics (#8198)
* feat(azure.ai.agents): scaffold nextstep package and isTerminal helper
Add the foundation for context-aware `Next:` guidance described in PR #8057:
- New `internal/cmd/nextstep` package with `Suggestion`, `State`, `ServiceState`, `AuthState` types and a format-agnostic `PrintNext` writer that aligns commands on the longest entry and caps output at one primary + one secondary line.
- Add an `isTerminal(fd uintptr) bool` helper in `internal/cmd/helpers.go` wrapping `golang.org/x/term`; promote that module from indirect to direct in `go.mod`.
- Register `nextstep` in the repo cspell dictionary.
No callers yet; resolvers, state assembly, and command wiring land in subsequent commits.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents): add AssembleState to nextstep package
Introduces nextstep.AssembleState, a single best-effort probe of the current azd world that the resolver (next commit) will read from. It captures three things the design relies on:
1. Whether AZURE_AI_PROJECT_ENDPOINT is set in the active environment (HasProjectEndpoint).
2. The agent services declared in azure.yaml, in alphabetical order (Services).
3. For each service, whether azd recorded a successful deploy. The signal is AGENT_<KEY>_VERSION non-empty in the env, matching the convention written by registerAgentEnvironmentVariables in service_target_agent.go. KEY is derived via the same spaces+hyphens-to-underscore upper-case transform getServiceKey uses (lines 222-226 of service_target_agent.go).
Probes are best-effort: transport errors are collected and returned alongside a partial State so resolvers can still degrade gracefully (e.g., suggest azd init when project load fails).
A small Source interface decouples the assembler from *azdext.AzdClient so tests can be hand-rolled fakes; production wraps the real client via NewSource. WithAuthProbe / WithOpenAPIProbe options are plumbed but inert until commit 1.3 / 1.4 land keeps the public API stable from day one so callers and tests don't need rewriting later.
Plan refs closed: D4 (IsDeployed rule). Closes the data-gathering half of Phase 1 commit 1.2.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): scope nextstep state to azure.ai.agent services
AssembleState's collectServices iterated every azure.yaml service, so a project with mixed hosts (e.g. one agent + one containerapp web tier) would have leaked the web service into nextstep's view and triggered spurious AGENT_<KEY>_VERSION env lookups for it.
Filter at the boundary on Host == agentHost (mirrors the cmd.AiAgentHost literal; intentional duplication to keep nextstep importable from cmd without a cycle).
Tests: existing fixtures updated to use the canonical host; new 'non-agent services are filtered out' case pins the behavior; TestAgentHostConstant pins the literal to guard against drift.
Resolves the F1 finding from the cross-pollinated review of 5ab18b7d1 (3/3 reviewer consensus).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents): add nextstep resolver, OpenAPI extractor, and error vocabulary
Phase 1 commit 1.3. Three pure-Go source files plus tests, all under
`internal/cmd/nextstep/`. No callers yet; nothing prints. Wires the
remaining "decide what to print" machinery so Phase 2 commits can swap
out the hardcoded hint blocks in init/run/invoke/show/deploy.
resolver.go
Pure decision functions over *State, one per command outcome:
- ResolveAfterInit
- ResolveAfterRun
- ResolveAfterInvoke (success + typed failure)
- ResolveAfterShow
- ResolveAfterDeploy
Filesystem and OpenAPI-cache access flow through caller-injected
closures (cachedPayload, readmeExists) so the resolver stays pure
and unit-testable. No I/O, no globals.
openapi.go
- ExtractInvokeExample(spec []byte) string: walks
paths./invocations.post.requestBody.content.application/json with
explicit $ref short-circuit at both requestBody and schema levels.
Resolution order: content.example -> schema.example -> generated
from required+properties[*].example -> "". Silent on any miss.
- ReadCachedOpenAPISpec(configDir, agentName, suffix): mirrors the
writer-side path shape from helpers.go (fetchOpenAPISpec) so the
two stay in lockstep. Returns (nil, nil) on os.ErrNotExist.
error_codes.go
Typed wire-level vocabulary, sourced verbatim from the vienna
platform's authoritative enums:
- UserErrorCode (HostedAgentVersionManager.cs)
- SessionErrorCode (Session/Exceptions/SessionErrorCode.cs)
- AgentVersionStatus (Contracts/V2/Generated/Agents/.../...Status.cs)
Plus RemediationForUserErrorCode / RemediationForSessionErrorCode
helpers returning the platform's troubleshooting URL + suggestion
text. Surfaces codes verbatim; no re-classification. The platform
appends its own aka.ms TSG link via WithTroubleshootingInfo, so the
extension just passes Code + Message through.
Strategy delta D5 (will be recorded in STRATEGY-DELTA.md): the plan
assumed cache path .azd/agent-cache/<env>-<agent>-openapi.json; the
actual writer in helpers.go:317-374 uses
<configDir>/openapi-<safeName>-<suffix>.json where safeName runs
strings.ReplaceAll on "..", "/", and "\\". The reader mirrors that
shape byte-for-byte so the two halves never drift.
Tests cover every branch in every resolver, every $ref-short-circuit
path in the extractor, the writer/reader sanitization contract, every
remediation arm in the error_codes mapping, and pin every wire-level
string against the platform contract (so a typo in a Go const can't
silently diverge from what the service emits).
Closes plan items C5, C11 (foundation). Sets up the Phase 2 callers.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): apply consensus fixes to nextstep resolver
Three findings emerged from the 3-model code review of commit 0b395756f
(Opus 4.7 xhigh, Sonnet 4.6, GPT-5.5) and were corroborated via
cross-pollination across the reviewers. Three were adopted; one was
dropped after the author empirically tested the affected shell.
F-A: shell-escape single quotes in OpenAPI-derived payloads.
resolver.go lines 104 and 313 wrapped state.OpenAPIPayload / cached
payload in single quotes via raw concatenation. The payload comes
from json.Marshal in ExtractInvokeExample, which does not escape
apostrophes, so an OpenAPI example such as {"q":"don't"} terminated
the surrounding single-quoted shell argument and broke the suggested
invoke command. Introduce shellEscapeSingleQuoted using the POSIX
'\'' idiom and route both sites through it. Cross-pollinated: 3 of
3 reviewers concurred.
F-B: honor ServiceState.Protocol in ResolveAfterShow Active branch.
The Active case unconditionally passed ProtocolResponses to
invokeCommandFor, so an invocations-protocol agent was suggested the
responses-style "Hello!" payload (which the agent rejects). Look
up the service via findService and default to ProtocolResponses
only on miss. Existing test asserted only a substring containing
"azd ai agent invoke echo", which passed for either payload that
is why the bug slipped past code review on 1.3. Replace the
substring assertion with exact matches and add explicit subtests
for invocations vs responses. Cross-pollinated: 3 of 3 concurred.
F-D: populate ServiceState.Protocol from agent.yaml in collectServices.
The Protocol field was declared in types.go but never written by
the production code path, so F-B's lookup would have silently fallen
back to ProtocolResponses for every agent in real use. Add
loadServiceProtocol(projectPath, relativePath) that reads
<root>/<rel>/agent.yaml, parses agent_yaml.ContainerAgent, and
picks ProtocolResponses when declared (broadest compatibility),
ProtocolInvocations when only invocations is declared, or "" on
any error. All failure modes are silent the resolver degrades
to responses-default rather than surfacing transient I/O errors
through the next-step hint. Cross-pollinated: Opus, Sonnet, and
GPT-5.5 all confirmed the field was production-dead.
F-C dropped: bash !" history expansion.
Sonnet flagged that "Hello!" would trigger bash history expansion.
Opus empirically refuted by running bash 5.1.16: !" is not a
history designator and bash leaves it literal. GPT-5.5 confirmed
on cross-pollination. No change.
Tests:
TestResolveAfterRun gains an apostrophe-in-payload case.
TestResolveAfterDeploy gains an apostrophe-in-payload case.
TestResolveAfterShow Active row split into an explicit substring
assertion plus three subtests asserting protocol-driven payload
selection.
TestLoadServiceProtocol covers single/multi/empty/malformed
manifests and missing files.
TestAssembleState_PopulatesProtocolFromAgentYaml exercises the
end-to-end path on a temp dir.
No user-visible change yet; resolvers remain wired only to themselves.
Phase 2 will surface the corrected suggestions to real users when
init.go is the first caller.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): clarify shellEscapeSingleQuoted doc comment
The previous doc comment named the POSIX escape idiom literally using
backtick-delimited examples that included backslash-apostrophe
sequences. Those byte sequences proved fragile through PowerShell
heredoc / editor format-on-save round-trips, and ended up showing
U+201D smart-quotes in the committed file instead of the intended
ASCII characters. A user reading the comment would also have been
misled: the names given (after the smart-quote substitution) did not
match what the function actually emits on line 397.
Rewrites the comment in prose, anchoring the byte-pattern reference
to the implementation line (which uses a Go raw string so the literal
cannot be mangled). Also restates the PowerShell adaptation guidance
in terms of PowerShell's own two-consecutive-apostrophes convention
instead of referencing the POSIX byte pattern.
3-of-3 reviewer consensus on the underlying finding (Sonnet flagged
the original; Opus and GPT-5.5 cross-pollinated confirmation).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): drop stale line reference in shellEscapeSingleQuoted doc
The previous doc rewrite pointed at "line 397" for the byte pattern,
but in the committed file line 397 is mid-paragraph prose about
json.Marshal. The actual implementation line moved to 404 once the
prose rewrite expanded the comment by six lines. A reader following
the cross-reference would land in the wrong place.
Drops the line-number reference in favor of "the implementation
below uses a Go raw string for that sequence so its byte pattern is
stable across edits." Hard-coded line numbers inside the same file
are inherently fragile and should be avoided.
3-of-3 reviewer consensus on the stale-reference finding: GPT-5.5
and Opus and Sonnet independently flagged it on the 787145acc
review pass. Fix mirrors what all three reviewers suggested.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents): silent fetchOpenAPISpec + wire cache-only OpenAPI probe
Refactors fetchOpenAPISpec so callers control the "OpenAPI spec saved to %s"
output, and wires the previously-placeholder WithOpenAPIProbe option in the
nextstep package to actually populate State.HasOpenAPI / OpenAPIPayload from
the on-disk cache the invoke flow writes.
Closes critique items C5 (silent fetch) and C6 (probe wiring) from the
implementation plan. No user-visible behavior change in this commit; the
"OpenAPI spec saved to ..." line still surfaces on fresh writes from invoke,
and stays silent on cache hits and errors.
helpers.go
- fetchOpenAPISpec now returns (specFile string, fresh bool).
fresh==true means this call wrote a new spec to disk; fresh==false
means cache hit OR any failure. Callers print the "saved to" line
gated on fresh; future callers (doctor, run-time probe) that want
silence simply ignore the bool. The print is no longer inside the
helper.
invoke.go
- Both call sites (local fresh fetch, remote conditional fetch) now
emit the "OpenAPI spec saved to %s" line themselves via the (path,
fresh) return. Behavior is byte-identical to before; only the
ownership of the print moved.
nextstep/state.go
- WithOpenAPIProbe(enabled bool) becomes WithOpenAPIProbe(agentName,
suffix string). Empty agentName or suffix disables the probe (the
zero value).
- assembleState now runs a strictly cache-only OpenAPI lookup when
the probe is enabled and the project + env name are both known.
configDir is computed as filepath.Join(project.Path, ".azure",
envName) the same directory fetchOpenAPISpec writes into, so
reader and writer stay in lockstep without an extra round-trip to
the gRPC source. Cache miss, malformed spec, no extractable payload
all silently leave HasOpenAPI=false and the resolver falls back to
the protocol-generic <payload> literal.
nextstep/state_test.go
- TestOptionsApplyCleanly updated for the new WithOpenAPIProbe shape.
- TestWithOpenAPIProbe_EmptyArgsDisableProbe pins the disabled-default
semantics (empty agentName / suffix means probe is off).
- TestAssembleState_WithOpenAPIProbe_PopulatesPayloadFromCache exercises
the happy path: a real on-disk spec under .azure/<env>/ produces a
populated State.OpenAPIPayload via ExtractInvokeExample.
- TestAssembleState_WithOpenAPIProbe_MissingCacheLeavesPayloadUnset
pins the cache-miss fallback.
- TestAssembleState_WithOpenAPIProbe_DisabledWhenAgentEmpty proves an
on-disk cache is ignored when the option is called with an empty
agentName, so callers can centrally disable the probe.
Records strategy delta D9 (fetchOpenAPISpec silencing shape) and D10
(WithOpenAPIProbe shape) in .tmp/pr-8057/STRATEGY-DELTA.md.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents): detect missing env vars during nextstep state assembly
`State.MissingInfraVars` / `State.MissingManualVars` were declared in
commit 1.2 but never populated; the resolver branches in commit 1.3
that consume them only ever saw nil slices. This commit adds the
detection step inside `assembleState` so the resolver can suggest the
right next action when the user has unprovisioned `${VAR}` references
in any agent.yaml.
What the helper does
- For every azure.ai.agent service in `azure.yaml`, opens the matching
`<projectPath>/<svc.RelativePath>/agent.yaml` and walks the
`environment_variables` block.
- Extracts unique `${VAR}` references via a small package-level regex
(`envVarRefPattern`). The optional `(?::-[^}]*)?` non-capturing tail
tolerates POSIX-style defaults like `${VAR:-fallback}` without
pulling them into the captured name.
- Looks each name up against the current azd environment. Names whose
value is set are skipped. Names whose value is unset get partitioned:
- leading `AZURE_` -> `MissingInfraVars` (`azd provision` outputs
in the AI Foundry templates uniformly start with this prefix:
`AZURE_AI_*`, `AZURE_OPENAI_*`, `AZURE_SUBSCRIPTION_*`, etc.)
- everything else -> `MissingManualVars` (`azd env set` candidates)
- Results are deduplicated cross-service (so two services referencing
`${AZURE_AI_PROJECT_ENDPOINT}` collapse to one entry) and returned
sorted ascending, matching the existing `slices.Sorted` style.
Error / partial-state behavior
- agent.yaml read or parse errors are silent (return nil refs). The
resolver falls back to its default branch rather than emitting
guidance about variables we cannot prove are needed.
- `src.EnvValue` transport errors append to `*errs` so the snapshot
caller can surface them in --debug output, but never abort. This
mirrors the existing `isDeployed` contract.
- `detectMissingVars` is only invoked when both `project != nil` and
`envName != ""`; otherwise both lists stay nil and the existing
resolver code paths are unaffected.
Why classification is `AZURE_` prefix only
The heuristic is intentionally coarse. Documented in the helper godoc:
misclassifying a manual var as infra at worst points the user at
`azd provision` instead of `azd env set`; the inverse still yields an
actionable hint. A future commit can swap in a richer rule (consult
`main.bicep` outputs, project-level allow-list) without touching the
public API of `AssembleState`.
Why split this from the init.go wiring (commit 2.2)
The resolver's "no MissingVars" branch suggests `azd ai agent run`,
which fails for an unprovisioned env. Wiring init.go without first
populating MissingVars would be a behavior regression versus the old
hardcoded `azd up` hint. Splitting also keeps each commit reviewable
in isolation: 2.1 is pure state-assembly logic with no command
wiring, 2.2 is a small swap-in at the call site.
Tests added in state_test.go
- TestExtractAgentYamlEnvRefs: table with 7 cases covering bare refs,
defaulted refs, multiple-refs-per-value, cross-value dedupe, no env
block, literal-only values, malformed YAML.
- TestExtractAgentYamlEnvRefs_MissingFileOrArgs: empty args + missing
manifest all return nil.
- TestAssembleState_PopulatesMissingVars: end-to-end via assembleState
with a real agent.yaml fixture mixing set + unset infra + manual vars.
- TestAssembleState_MissingVarsDedupedAcrossServices: two services
with overlapping refs collapse to one entry each list.
- TestAssembleState_AllVarsSetLeavesMissingEmpty: regression guard for
the "everything provisioned" path.
- TestAssembleState_MissingVarTransportErrorSurfaced: EnvValue errors
propagate to errs slice without crashing or mis-populating.
No production caller of `AssembleState` exists yet, so runtime
behavior is unchanged. Commit 2.2 swaps init.go to call the resolver,
at which point the populated state takes effect.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): exclude defaulted env refs from missing-vars detection
Before this change, an agent.yaml ref written as `${VAR:-fallback}` would
classify VAR as missing whenever it was unset in the azd environment, and
the resolver would prompt the user to `azd provision` or `azd env set` it.
That hint is misleading: the deploy-time expander (drone/envsubst, used by
service_target_agent.go) honors the `:-` default, so the deploy succeeds
with the fallback value and the user has no real action to take.
Fix: make the regex's default-tail group capturing (`(:-[^}]*)?`) and skip
matches where group 2 is non-empty. Bare `${VAR}` still surfaces as missing
when unset, matching the runtime requirement. Bare-dash `${VAR-fallback}`
(POSIX "if unset, use fallback") continues to be silently dropped — its
deploy-time semantics also carry a fallback, so the same user-visible
result holds.
Tests:
* `TestExtractAgentYamlEnvRefs` table: rename + flip "reference with
default tail captured as bare name" -> "reference with default tail
is skipped" (want: nil). Add "bare ref alongside defaulted ref
returns only the bare one".
* New `TestAssembleState_DefaultedRefsAreExcludedFromMissingVars` end-
to-end: agent.yaml mixes one bare unset ref (must surface) with two
defaulted unset refs (must NOT surface, including the manual-vars
bucket). Confirms the partition stays correct when only AZURE_AI_
refs would have surfaced through the infra heuristic.
Reviewer consensus (2/3): Sonnet's option (b) — drop the regex-broadening
half of its companion finding and keep this change minimal. GPT-5.5
originated the misleading-hint observation; Sonnet cross-pollinated and
recommended this exact path. Opus REJECTed with the position that the
deploy-time hint is "wrong but right" (template intent), which holds for
template-supplied AZURE_ refs but breaks for manual vars such as
`${MY_API_KEY:-dev-fallback}`. Tie-breaker: the manual-vars case.
Verified clean against gofmt, go vet, go build, go test
./internal/cmd/nextstep/..., golangci-lint, cspell, copyright-check.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents): wire init success path to nextstep resolver
Replace the hardcoded `azd up` / `azd deploy <svc>` conditional at
init.go:1592-1607 with a call to nextstep.AssembleState +
ResolveAfterInit + PrintNext. The resolver inspects the active azd
environment plus each azure.ai.agent service's agent.yaml to emit
context-aware guidance:
- MissingInfraVars -> `azd provision` + trailing `azd deploy`
- MissingManualVars -> up to 3 `azd env set <KEY> <value>` lines
- clean -> `azd ai agent run` + trailing `azd deploy`
First user-visible behavior change in this PR. The legacy
AZURE_AI_PROJECT_ID dichotomy is replaced by the more informative
missing-vars partition; the new trailing line is the generic
`azd deploy` (no service-name suffix) per the design spec.
State-assembly errors are intentionally ignored: the resolver
degrades gracefully on partial state per the design spec.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): reserve trailing slot in nextstep renderer for follow-up nudges
3-of-3 reviewer consensus on commit 077c550ba surfaced that PrintNext
silently truncates ResolveAfterInit's trailing `azd deploy` line when
there are 2+ missing manual vars. The resolver assigns the trailing
nudge Priority 90 but the renderer sorts ascending and caps at
maxRendered=2; once the manual-vars branch emits 2 or 3 `azd env set`
lines (priorities 20-22) the deploy nudge is the first thing dropped.
Fix: add a Trailing flag to Suggestion. renderBlock now partitions on
the flag and reserves one of its maxRendered slots for the lowest-
priority trailing entry. Primary suggestions fill the remaining slots
in ascending Priority order, as before. ResolveAfterInit marks its
`azd deploy` footer Trailing:true; other resolvers are unchanged
(none of them currently emit a structural footer).
Net effect for end users finishing `azd ai agent init` with N missing
manual variables:
N=1 -> `azd env set X` + `azd deploy` (unchanged)
N=2 -> `azd env set A` + `azd deploy` (was: A + B, deploy lost)
N=3 -> `azd env set A` + `azd deploy` (was: A + B, deploy lost)
The user is named one missing variable plus the deploy nudge. The
previous behavior was equally lossy -- it just dropped the wrong
thing. Naming every missing var would need a higher maxRendered, which
trades the design's two-line UX cap for completeness; the design spec
chose the cap, so the fix preserves it.
Coverage:
- TestPrintNext gains "trailing suggestion survives truncation",
"trailing-only block renders as the single line", and "multiple
Trailing entries collapse to the lowest-priority one".
- TestResolveAfterInit (table) + TestResolveAfterInit_ManualVarsCapAtThree
now assert `out[len(out)-1].Trailing == true`.
Reviewer provenance: Opus 4.7 xhigh (High, empirically reproduced),
Sonnet 4.6 (Medium), GPT-5.5 (Medium) all independently surfaced the
same truncation bug on the b39188643..077c550ba diff; 3/3 consensus,
no cross-pollination needed. Opus's Option B (sticky tail) is the
approach implemented here -- the alternatives (cap manual-var lines
to 1; introduce a renderer-limit parameter) either lose more user
info or pollute the API.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents/nextstep): trailing collision now keeps the most-deferred footer
Cross-pollinated 3-model review on 8fa72db2a flipped the Trailing-tiebreaker
policy. Previous implementation used "first Trailing wins" (lowest Priority on
ascending sort). That defeats the regression-prevention purpose of the sticky-
tail fix: if a future resolver accidentally flags a Priority < 90 entry as
Trailing, current code silently drops the intended `azd deploy` footer
(Priority 90) the exact regression 2.2.1 was meant to prevent.
Switch to "last Trailing wins" (highest Priority on ascending sort = most-
deferred footer). Mistake-likelihood is asymmetric: copy-pasting Trailing onto
a low-priority hint is plausible; inventing a higher-than-deploy Priority is
not.
Reviewers ratifying last-wins: Sonnet 4.6 (original finder, Medium), GPT-5.5
(swung after cross-pollination), Opus 4.7 xhigh (reversed his initial Q5
ratification after cross-pollination). 3/3 consensus.
Changes:
- format.go: remove `if trailing == nil` guard so every Trailing entry
overwrites; the loop terminates with a pointer to the highest-Priority
Trailing entry.
- types.go: docstring now spells out "highest Priority wins on collision".
- format_test.go: rename "multiple Trailing entries collapse" case and pin
`tail-b` (Priority 90) as the survivor instead of `tail-a` (Priority 80).
Preflight: gofmt clean, go vet clean, go build clean, nextstep tests pass
(10.9s), cmd tests pass (15.5s), golangci-lint 0 issues, cspell 0 issues.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents/run): resolver-driven Next: block on local run
Replace the hardcoded `azd ai agent invoke --local "Hello!"` follow-up
hint with the new nextstep package. The resolver picks a protocol-
appropriate sample payload (`{"message": "Hello!"}` for invocations
protocol agents; `"Hello!"` literal for responses protocol agents) and,
when a cached OpenAPI spec from a prior `invoke` is available, replaces
the default with the exact request body the agent expects. A tip line
pointing at the OpenAPI doc is appended when the cache is empty.
Smoke-tested against the hello-world-python-invocations sample (Foundry
bring-your-own template). Output:
Next: azd ai agent invoke --local '{"message": "Hello!"}' -- send a sample request to the running agent
curl http://localhost:<port>/invocations/docs/openapi.json -- tip: inspect the spec to learn the agent's exact payload
Starting agent on http://localhost:18347 (Ctrl+C to stop)
The `After startup, in another terminal, try:` preamble is dropped in
favor of consistency with the `init` success path (`init.go:1607`):
the `Next:` header + the `Starting agent on <url> (Ctrl+C to stop)`
line directly below convey the temporal ordering. If user testing shows
confusion, a follow-up commit can wire `After startup...` text back in
via the resolver's Description column.
Out of scope: the `<port>` placeholder in the curl tip is a known
documentation-grade hole. Substituting the live port requires plumbing
`flags.port` through `State.Port` and the resolver deferred to
keep this commit small.
Preflight: gofmt clean, vet clean, build clean, cmd tests pass (14.9s),
golangci-lint 0 issues, cspell 0 issues. Smoke-tested end-to-end.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): align local OpenAPI cache key + restore "After startup" preamble
Two 3/3-consensus findings from the multi-model review of commit 2.3
(12aa2bb2d).
F1 (HIGH, 3/3 consensus) - local OpenAPI cache filename mismatch:
invoke.go:520 wrote the on-disk cache using the composite agentKey
(e.g. openapi-localhost:8088_<projectHash>_agents_hello-world_versions_
latest_local-local.json), while run.go reads it via
nextstep.WithOpenAPIProbe(serviceName, "local") which expands to the
plain name (openapi-hello-world-local.json). The two filenames could
never match, so state.HasOpenAPI was permanently false and the
"subsequent runs surface the cached OpenAPI sample" path in
ResolveAfterRun was dead code.
Fix: extract resolveLocalAgentName from resolveLocalAgentKeyWithPort
and use the plain name at the cache write site. The session/conversation
store at invoke.go:504 keeps the composite key (it needs the port +
projectHash to avoid cross-project collisions in the shared config
store); only the cache file was buggy. The split matches the existing
remote-write pattern at invoke.go:629 (remote already passes the plain
name) and adds an explanatory comment block at the asymmetry site.
F2 (Medium, 3/3 consensus) - "In another terminal" signaling restored:
commit 2.3 dropped the explicit "After startup, in another terminal,
try:" preamble in favor of init.go-style uniformity. But init exits
and hands the prompt back, while run holds the foreground TTY for the
agent. Without the preamble, top-down readers see the Next: block
before the "Starting agent on http://..." line and have no clue the
current terminal is about to be busy. Common failure mode: paste the
suggested invoke into the same terminal, Ctrl+C the agent, and ask
what just happened. Restore the preamble (8 words, pure revert,
proven UX).
Reviewer trace:
- Sonnet 4.6 surfaced both findings on the first pass.
- GPT-5.5 independently surfaced F1, ratified F2 fix-shape (Option C:
restore preamble) on cross-pollination.
- Opus 4.7 xhigh missed both on first pass (checked URL endpoint
symmetry but not filename-key symmetry on F1; anchored on commit
message intent on F2) and reversed to AGREE on cross-pollination.
Verification:
- Preflight clean: gofmt, go vet, go build, full cmd tests (14.7s,
nextstep 3.0s), golangci-lint 0 issues, cspell 0 issues.
- Live smoke test against hello-world-python-invocations sample:
`azd ai agent run --port 18348` now prints "After startup, in
another terminal, try:" followed by the Next: block + the
"Starting agent on http://..." line in the expected order.
- Code inspection confirms cache writer (after fix) and reader both
use the plain service name, so filenames align.
Files: 3 changed, +36 / -11.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): resolve local agent service once in invocationsLocal (no double prompt)
3/3-consensus regression from the multi-model review of commit 2.3.1
(f4a7f68aa).
Severity: Medium. The 2.3.1 refactor collapsed the previous single
`resolveLocalAgentKey` call at `invoke.go:498` into two paired calls:
the existing one PLUS a new `resolveLocalAgentName` at line 504. Both
funnel through `resolveLocalAgentName` (helpers.go:161), which
unconditionally calls `resolveAgentServiceFromProject` even when the
result is only needed for the `name == ""` branch.
In the interactive multi-agent case (project with >=2 azure.ai.agent
services in azure.yaml + no `--no-prompt`), this fires
`azdClient.Prompt().Select` TWICE. The CLI validation at
`invoke.go:125-131` rejects `--local` + a positional name, so every
invoke that reaches `invocationsLocal` enters with `name=""` the
double prompt is reliably hit, not a corner case.
Worse, the two prompts are independent. If a user picks different
services on the two prompts (alphabetic list, not anchored to the
previous choice), `agentKey` (used by `resolveStoredID` for the
session/conversation store) refers to service A while `agentName`
(used by `fetchOpenAPISpec` for the OpenAPI cache filename) refers
to service B. The session ID resolved against A is then used to
invoke service B's `/invocations` endpoint silent cross-service
state corruption.
In `--no-prompt` + multi-service, the second `resolveLocalAgentName`
fails inside `resolveAgentServiceFromProject`, the error is swallowed
at helpers.go:165, and `agentName` falls back to `"local"`. The
session store still gets the correctly resolved composite key but
the cache filename mismatches re-introducing a flavor of the
original F1 bug. (Note: this `--no-prompt` failure mode pre-existed
before 2.3.1 as well; it's a known gap, not a new regression.)
Fix: resolve the agent service ONCE via `resolveLocalAgentName` and
derive the composite key locally via `buildLocalAgentKey`. Net
change at the call site is two lines (collapse from two paired calls
to one resolve + one local derive). The 5-line comment block is
expanded to explain why both values are needed and why we resolve
once. `DefaultPort` is retained (not switched to `a.flags.port`) to
preserve pre-2.3.1 session-store semantics a port switch would
change cross-invocation session compatibility and is out of scope
here.
Reviewer trace:
- Opus 4.7 xhigh surfaced the finding on the ratification review of
f4a7f68aa.
- Sonnet 4.6 AGREE on cross-pollination; argued HIGH severity over
a `--no-prompt` regression that turned out to pre-exist 2.3.1
(so we land at Medium).
- GPT-5.5 AGREE on cross-pollination; suggested using `a.flags.port`
instead of `DefaultPort` deferred because that would change
session-store semantics (per-port isolation) and needs its own
consensus.
Verification:
- Preflight clean: gofmt, vet, build, full cmd tests (14.6s),
nextstep tests (3.8s), golangci-lint 0, cspell 0.
- Live smoke test against hello-world-python-invocations sample:
session prefix `[localhost:8088/9c184b88be3f8efb/agents/hello-
world-python-invocations/versions/latest/local]` is byte-
identical to the pre-fix output, and the resurfaced session ID
(882e2e6a-...) matches the prior invocation confirms the
composite key is unchanged and session-store backward compat
is preserved.
1 file, +12 / -6.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents): wire invoke.go success paths to nextstep.ResolveAfterInvoke
Phase 2 commit 2.4. Replaces the four `invoke` success-path returns
with calls to `nextstep.ResolveAfterInvoke` `PrintNext`, so the
`Next:` block at the end of a successful invoke is policy-driven
(InvokeLocal `azd deploy`; InvokeRemote `azd ai agent show
<name>` + `azd ai agent monitor --follow`) instead of silent.
Adds a small file-local helper `(a *InvokeAction).emitInvokeSuccessNextStep`
so all four success paths funnel through one place keeps the call
sites symmetric and makes the future failure-path commit a single
edit point.
State is intentionally nil at every success call site:
`ResolveAfterInvoke`'s success branches (`resolveInvokeSuccess` at
resolver.go:160) don't read State, and `AssembleState` is not free
(Project + CurrentEnvName + per-service EnvValue gRPC roundtrips
for `nextstep.WithOpenAPIProbe`). The companion follow-up commit
that wires invoke-failure paths will assemble state at the failure
site, where it actually feeds `RemediationForSessionErrorCode`.
Touch points (4 success returns rewritten from `return foo(...)` to
`if err := foo(...); err != nil { return err }; emit(); return nil`):
- `responsesLocal`: both JSON-and-not-JSON success branches.
- `responsesRemote`: post-`readSSEStream` success.
- `invocationsLocal`: post-`handleInvocationResponse` success.
- `invocationsRemote`: post-`handleInvocationResponse` success.
For the InvokeLocal branches, the resolver's success path ignores
agentName (returns the same `azd deploy` line regardless), so the
helper is called with empty agentName at the two local sites. This
also dodges the resolve-once-derive-both concern: no new
`resolveLocalAgentName` calls are added in `responsesLocal`, so
there's no risk of re-introducing the double-prompt regression that
commit 2.3.2 fixed for `invocationsLocal`.
Out of scope (deferred):
- Failure-path wiring (`SessionErrorCode` parsing from
`x-adc-response-details` header + body, then
`RemediationForSessionErrorCode` mapping). Tracked as commit 2.5.
- `nextstep.AssembleState` calls at the failure sites. Same commit.
Verification:
- Preflight clean: gofmt, vet, build, full cmd tests (14.3s),
nextstep tests (cached), golangci-lint 0, cspell 0.
- Smoke-tested against hello-world-python-invocations sample:
1. Local invocations protocol emits `Next: azd deploy --
the local invoke worked ship it to Azure`.
2. Remote invocations protocol emits two-line block with
`azd ai agent show hello-world-python-invocations` (the
resolved agent name) + `azd ai agent monitor --follow`.
- Session ID `882e2e6a-...` resurfaced under the byte-identical
composite key `[localhost:8088/9c184b88be3f8efb/agents/
hello-world-python-invocations/versions/latest/local]`
confirms backward compat across 2.3.1 2.3.2 2.4.
- `responsesLocal` / `responsesRemote` not exercised on the wire
(sample uses invocations protocol, not responses); verified by
code inspection that the two return points were rewritten
correctly and the helper call is reached on success.
1 file, +38 / -4.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): TTY-gate Next: emission in invoke success helper
emitInvokeSuccessNextStep wrote nextstep.PrintNext to os.Stdout
unconditionally, violating the call-site-gating contract documented in
three places in the nextstep package:
- nextstep/types.go:19-22 -- "Output discipline lives at the call
sites: the package never writes to os.Stdout directly and never
inspects --output flags. Callers gate on the isTerminal helper..."
- nextstep/format.go:30-33 -- "PrintNext does not inspect TTY state
or output-format flags -- those decisions live at the call site..."
- helpers.go:810-811 -- isTerminal's own doc: "Used to gate human-only
output such as the next-step guidance block."
Symptom: `azd ai agent invoke ... > file` / `... | tee log` /
CI-captured stdout received the trailing "Next:" block mixed in with
the agent's reply, corrupting files and logs the user reasonably
expected to contain only the model output. printAgentResponse's
fallback path (invoke.go raw-body branch + json.MarshalIndent dump)
is particularly affected: it emits structured-ish data that the Next:
block then invalidates.
Fix: one-line `if !isTerminal(os.Stdout.Fd()) { return }` at the top
of the helper. All four success paths funnel through this one
helper (the original 2.4 design), so a single gate covers every
invocation mode (responses local/remote, invocations local/remote).
No behavior change on TTY stdout.
Smoke-tested against hello-world-python-invocations sample:
- direct TTY: Next: block emits as before (2-line "show + monitor")
- file redirect (`invoke > out.txt`): Next: block suppressed; only
the agent's streamed reply lands in the file
- pipe (`invoke ... | jq`): same -- block suppressed
Scope deliberately limited to invoke. Two other call sites have the
same pre-existing omission:
- init.go:1608 -- theoretical only; init is interactive and writes
no machine-readable output. Can be folded into a separate
cleanup commit.
- run.go:180 -- coupled to a "After startup, in another terminal,
try:" preamble at run.go:179. TTY-gating just the PrintNext call
would leave a dangling sentence. Needs a small design pass and
its own commit -- not folded here.
Code-review consensus on commit eb01d184b:
- Sonnet 4.6 raised the finding at HIGH severity.
- GPT-5.5 independently raised it at Medium.
- Opus 4.7 (xhigh, cross-pollination pass) confirmed the bug,
agreed Medium is the right severity (no `invoke --output json`
today; existing output already not jq-clean), and explicitly
recommended invoke-only scope -- echoed the rationale above
for deferring init.go and run.go.
Pre-flight: gofmt clean, vet, build, golangci-lint 0, cspell 0,
cmd-package tests (14.5s) + nextstep tests pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents): wire invoke failure paths to ResolveAfterInvoke
Phase 2 commit 2.5 surfaces the platform's recommended remediation
when an invoke fails. Mirrors the 2.4 success-helper pattern with one
new file-local helper and four wire-up sites.
New helper: emitInvokeFailureNextStep(mode, agentName, sessionCode)
funnels all four failure paths through one place. It builds an
InvokeFailure{SessionCode: SessionErrorCode(sessionCode)} and passes
it to ResolveAfterInvoke; the resolver's failure branch turns each
known SessionErrorCode into the canonical remediation line (with
optional secondary action) via RemediationForSessionErrorCode, and
falls back to `azd ai agent monitor --tail 100` for empty or
unknown codes. Local-invoke failures pass empty agentName + empty
sessionCode and get a single `see local server output` line per
the resolver's InvokeLocal branch.
Wire-up sites in invoke.go:
* responsesLocal (HTTP 4xx/5xx branch) emit before fmt.Errorf
* responsesRemote (HTTP 4xx/5xx branch) extract x-adc-response-details
from resp.Header before reading body, then emit
* invocationsLocal (handleInvocationResponse err) local server doesn't
set the header; pass ""
* invocationsRemote (handleInvocationResponse err) capture
x-adc-response-details from resp.Header BEFORE calling
handleInvocationResponse (handler reads the body, header survives)
Decisions baked in:
* State is nil at every failure site. resolveInvokeFailure's signature
(_ *State, ...) reflects that it doesn't read State today. Avoids
the gRPC AssembleState roundtrip at the exact moment the user is
staring at an error. If a future failure branch grows state-aware
behavior, switch to AssembleState at that one site.
* Output order: Next: BEFORE the error message (host renders error
last, via SilenceErrors=true + ReportError). Mirrors git's
`hint: ... error: ...` pattern. Smaller diff than the alternative
(sentinel-error + silent-stderr + bespoke printing) and acceptable
on an interactive terminal: Trace ID -> response body -> Next:
block -> Error: line.
* Separate helper from emitInvokeSuccessNextStep. Keeps the 2.4
success call sites byte-for-byte unchanged (already 3/3
reviewer-clean) and saves reviewers from re-verifying them.
* TTY-gate inherited at the helper boundary (same isTerminal check
that 2.4.1 added on the success helper). Pipe/redirect/CI capture
suppresses the human-only block; the error itself still flows
through stderr.
NOT in scope (deliberate):
* Connect-failure paths (responsesLocal:335-340,
responsesRemote:495-497, invocationsLocal:586-591,
invocationsRemote:699-701). Existing error messages already
include actionable guidance like `Start it with: azd ai agent
run`; Next: would be redundant.
* Agent-error envelopes (200 OK with error-shaped JSON or SSE error
event in handleInvocationSync / handleInvocationSSE). These are
agent-level errors, not platform errors; the platform's
SessionErrorCode vocabulary doesn't apply. Separate follow-up if
user feedback indicates value.
Smoke-tested against the deployed hello-world-python-invocations
sample:
* remote 4xx (invalid agent name) -> Trace ID -> blank line ->
`Next: azd ai agent monitor --tail 100 -- inspect recent
container logs for the failure` -> ERROR (host stderr).
Pipe/redirect to file: Next: line correctly suppressed; ERROR
still flows through stderr. Confirmed via Select-String against
redirected stdout file.
* success path unchanged: `azd ai agent invoke 'Hello!'` still
streams tokens and emits the 2.4 `Next: azd ai agent show ... +
monitor --follow` block at the end.
1 file, +46/-3.
Pre-flight clean (gofmt, go vet, go build, cmd-package tests 14.2s,
golangci-lint 0, cspell 0).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): gate invocations-protocol failure Next: by HTTP status
Follow-up to b2e58f8fc (Phase 2 commit 2.5). All three reviewers
(Opus 4.7 xhigh, Sonnet 4.6, GPT-5.5) independently flagged the
same Medium-severity bug: the new wire-up at invocationsLocal:640
and invocationsRemote:754 fires emitInvokeFailureNextStep on every
non-nil return from handleInvocationResponse, but that function
returns errors for THREE distinct cases:
1. HTTP 4xx/5xx platform failures (invoke.go:782-785)
2. Agent error envelope in 200 OK JSON (invoke.go:819-821, via handleInvocationSync)
3. Agent error in SSE error event (invoke.go:868-870, via handleInvocationSSE)
Cases 2 and 3 are agent-level errors carrying no
x-adc-response-details header, so sessionCode is "" and the
resolver falls to the empty-code branch:
`azd ai agent monitor --tail 100 -- inspect recent container
logs for the failure`. The agent process is healthy in those
cases its logs likely contain nothing useful; the issue is in
the request payload or the agent code. Per Opus's cross-protocol
review, the responses protocol's analogous agent-level errors
(printAgentResponse failed status at invoke.go:1175-1177,
readSSEStream failed / error events at invoke.go:1127-1129 and
:1148-1150) are correctly NOT wired, so the invocations protocol
was inconsistent with itself and with the commit's stated
architectural decision 5.
Fix: gate the emit on resp.StatusCode >= 400 at both invocations
sites. Adds a 5-line doc comment at the remote site explaining the
rationale and a one-line cross-reference at the local site. The
responses-protocol wire-ups (responsesLocal:391, responsesRemote:548)
already short-circuit on resp.StatusCode >= 400 before reaching
handleInvocationResponse, so they don't need the guard.
Consensus pipeline:
- GPT-5.5: proposed call-site guard exactly as applied here.
- Sonnet 4.6: proposed call-site guard or accept-and-document.
- Opus 4.7 (xhigh): proposed moving the emit INSIDE
handleInvocationResponse's 4xx branch or accept-and-document.
The "move inside" option would require restructuring
handleInvocationResponse (it's a free function, not a method
on *InvokeAction; can't reference a.emitInvokeFailureNextStep
without adding a callback or making it a method). Call-site
guard wins on minimal-diff grounds and matches 2/3 explicit
endorsement.
1 file, +14/-2. Pre-flight clean (gofmt, go vet, go build,
cmd-package tests 14.6s, golangci-lint 0, cspell 0). Live smoke:
4xx (invalid agent name) still emits Next: above the error;
success path unchanged.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents/nextstep): match AgentVersionStatus wire values to API
The Foundry Hosted Agents API returns AgentVersionObject.Status as lowercase (verified empirically: 'azd ai agent show' returns 'active'). The resolver's AgentVersionStatus constants were title-case, so ResolveAfterShow's typed switch never matched live data and every successful show would have hit the 'unknown / transitional' fallback branch.
Lowercase the five constants (creating/active/failed/deleting/deleted) and the matching keys in the wire-drift test. Doc comment on the type now points at agent_api/models.go and the empirical evidence. No resolver logic change; no other callers.
Found during commit 2.6 (show.go wire-up) implementation, before any user-visible exposure.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* feat(azure.ai.agents/cmd): wire show.go to nextstep resolver
Adds context-aware `Next:` guidance to `azd ai agent show`.
**Table output (--output table)**: render the existing field table, then
on TTY emit a blank line + `nextstep.PrintNext`. Pipes and file
redirects suppress the block (consistent with invoke's TTY gate from
commits 2.4.1 and 2.5.1).
**JSON output (--output json, the default)**: surface the same guidance
under a new optional `next_step` envelope field. JSON is for machines:
emitted unconditionally regardless of TTY. The envelope wrapper type
omits the resolver's internal `Priority` and `Trailing` renderer hints
- consumers only need `{command, description}`.
**State assembly**: `a.resolveNextStep` calls `nextstep.AssembleState`
once (best-effort), overrides `state.AgentStatus` with the live
`version.Status` returned by the API, then calls `ResolveAfterShow(
state, a.serviceName)`. Passes `info.ServiceName` (azure.yaml service
name) rather than `info.AgentName` so `findService` matches
`state.Services[].Name` for protocol lookup; the CLI's invoke command
re-resolves either name, so the suggested command works in both
common and divergent-name configurations.
**Backward compat**: `next_step` is `omitempty`; existing JSON parsers
continue to work. The existing `TestPrintAgentVersionJSON_*` test
keeps passing. `TestPrintAgentVersionJSON_NoLinks` gains one assertion
for the omit-when-nil contract. New `TestShowResultJSON_NextStepEnvelope`
locks the envelope shape (single suggestion, exact keys, no
priority/trailing leak).
Updated `printShowResultTable` signature to accept
`[]nextstep.Suggestion` (one new arg). The two test call sites pass
`nil` to assert the no-suggestions path renders cleanly.
Smoke verified against the deployed `hello-world-python-invocations`
sample (status=active). JSON: `next_step.suggestions[0].command` is
`azd ai agent invoke hello-world-python-invocations '{"message":
"Hello!"}'` - the protocol-aware payload pulled from the OpenAPI
cache, confirming the lowercased AgentVersionStatus constants from
commit 2.5.2 are necessary and working end-to-end. Non-TTY table
path suppresses the block (PowerShell tool not a TTY).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents/cmd/show): drop dead nil guard + doubled blank line
Consensus fix-up on commit 2.6 (be72cdba2), addressing two findings
that reached 3/3 reviewer agreement (both Low severity).
S1 — dead nil guard in `resolveNextStep`:
`AssembleState` (nextstep/state.go:174-215) unconditionally initializes
`state := &State{}` and has a single return path. The `if state == nil`
guard at the old line 206 was therefore unreachable. Removing it
prevents future contributors from inferring a non-existent contract
branch on `AssembleState`. The package's godoc already promises a
non-nil partial state on every call. Updated the function's doc comment
to cite this explicitly.
Opus-1 — doubled blank line between table and Next: block on TTY:
`printShowResultTable` was emitting `fmt.Println()` immediately before
`nextstep.PrintNext`. But `PrintNext` → `renderBlock` already prepends
its own leading `\n` (`nextstep/format.go:106-108`, "Leading blank
line separates the block from preceding output."). Combined with the
tabwriter's trailing `\n` on the last row, the result on TTY was
three line terminators before "Next:" — two visible blank lines, not
one. Verified by Opus xhigh against Format-Hex output. Sibling sites
(`init.go:1607-1608`, `invoke.go`'s `emitInvokeSuccessNextStep`) call
`PrintNext` directly without a preceding `Println` and produce a
single blank-line separator.
Both fixes are minimal — total diff is 5 inserted / 7 deleted, comment
text adjusted to capture the contract going forward.
Preflight: gofmt clean, vet clean, build clean, show tests pass
(4.89s), golangci-lint 0 issues.
Skipping the 3-reviewer pass on this commit per the precedent from
2.2.2 / 2.3.2 (trivial fix-ups don't need another full review pass).
The remaining 3 findings from the 2.6 review (G1 ServiceName/AgentName
divergence; G2 WithOpenAPIProbe wire-up; S2 untested status override)
land in commit 2.6.2, which is more substantive and does get a review
pass.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): correctness fixes from 2.6 cross-pollinated review
Lands the 3/3-consensus correctness findings from the cross-pollinated
review of `2930395cf..be72cdba2`. Splits the trivial cleanup (commit
2.6.1) from substantive behavior changes (this commit) so the diffs are
independently reviewable.
G1 (Medium, 3/3 with Opus's prior dismissal reversed) — ServiceName vs
AgentName divergence:
ResolveAfterShow previously took a single `agentName` parameter and used
it for both `findService` (which keys on the azure.yaml service name)
and `invokeCommandFor` (which embeds the deployed Foundry agent name in
the suggested URL). When the two diverge (typical when deploy appends a
suffix — `<service>-<suffix>` is a common Foundry naming pattern), the
emitted suggestion `azd ai agent invoke <serviceName> ...` produces a
URL path `/agents/<serviceName>/...` that 404s on the Foundry API.
Fix: split the signature to `ResolveAfterShow(state, serviceName,
agentName)`. Active branch uses serviceName for protocol lookup and
agentName for command emission. Unknown-status fallback uses serviceName
(matches show.go's lookup contract — `resolveAgentService` matches by
service name). show.go now passes both via a new `agentName` field on
ShowAction populated from `info.AgentName` alongside the existing
`serviceName`. Option B chosen over modifying invoke.go's gated
`if name == "" && info.AgentName != ""` translation because the latter
would change CLI semantics for users passing Foundry names positionally.
G2 (Medium, 3/3) — OpenAPI cache wiring:
show.go's `resolveNextStep` previously called `AssembleState` without
`WithOpenAPIProbe`, so `state.HasOpenAPI` was always false and the
Active-branch invoke suggestion always used the protocol-generic
literal. Fix: pass `nextstep.WithOpenAPIProbe(agentName, "remote")`
(matches `invoke.go:725`'s `fetchOpenAPISpec(... name, "remote", ...)`
cache write convention). `invokeCommandFor` now accepts `*State` and
prefers `shellEscapeSingleQuoted(state.OpenAPIPayload)` over the
protocol literal when `state.HasOpenAPI && state.OpenAPIPayload != ""`.
Mirrors the pattern at `resolver.go:104-106` in `ResolveAfterRun`.
Best-effort silent fallback: when the cache is empty (no prior `invoke`
populated it, or cache lookup errored), the protocol-generic literal is
emitted unchanged. Same UX contract as `ResolveAfterRun`.
S2 (Low, 2/3 after Sonnet's Medium downgraded) — resolveNextStep
end-to-end wiring test:
Extracts `resolveNextStepFromSource` as the testable core of show.go's
`resolveNextStep` method, taking a `nextstep.Source` directly instead
of building one from `*azdext.AzdClient`. Production path
(`(*ShowAction).resolveNextStep`) calls it with `NewSource(a.azdClient)`;
tests inject a `fakeShowSource`. New public `AssembleStateFromSource`
in the nextstep package wraps the existing private `assembleState`
function.
Three wiring tests added in show_test.go:
- ActiveBranch_InvocationsProtocol: writes a real agent.yaml under a
`t.TempDir` project root and verifies AssembleState reads it,
detecting the invocations protocol, and ResolveAfterShow emits the
protocol-aware payload with the Foundry agent name (locks both G1
and G2's no-cache fallback path).
- UnknownStatusFallsBackToServiceName: locks G1's fallback choice.
- NonActiveBranches: sanity-checks the remaining status branches don't
depend on either name.
Plus three more cases added to TestResolveAfterShow_* in
resolver_test.go: DivergentNames (locks G1 directly), and
ActiveConsumesOpenAPICache subcases for plain payload, apostrophe
escaping, and empty-payload fallback (locks G2 directly).
Scope discipline (not in this commit):
- The OpenAPI cache for the deployed `hello-world-python-invocations`
sample is empty in this session — the smoke run after this commit
shows the protocol-generic literal `'{"message": "Hello!"}'` falling
through unchanged, confirming graceful degradation. Cache hit path
is covered by unit test.
- Pre-existing api-version bug on remote OpenAPI URL is tracked
separately ("Next up" #1 in plan.md).
Pre-flight: gofmt clean, vet clean, full extension test suite green
(cmd 14.2s, nextstep cached, all other packages green), golangci-lint
0 issues, cspell 0 issues. Live smoke against `hello-world-python-
invocations`: `azd ai agent show --output json` still returns the
expected `next_step` envelope with the protocol-aware command.
5 files changed, +173/-37.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): correct divergent-name invoke suggestion (G3)
Background
----------
Commit 84bfc741f (2.6.2) split ResolveAfterShow's signature into
(state, serviceName, agentName) and emitted the deployed Foundry
agent name as the positional of the suggested invoke command. The
stated rationale was: invoke's remote URL path embeds the agent
name verbatim, so passing the azure.yaml service name there would
yield a 404 from Foundry.
That rationale was right about the URL path but missed the upstream
failure point. All three 2.6.2 reviewers re-traced the consumer end
to end and reached consensus that the previously rejected fix
(unconditionally translate inside invoke.go) was the correct one.
G3 — Medium, 3/3 consensus (GPT-5.5, Sonnet 4.6, Opus xhigh)
------------------------------------------------------------
Trace of the broken case (azure.yaml services: { echo }, env
AGENT_ECHO_NAME=echo-deployed-x7q9):
1. Resolver emits: azd ai agent invoke echo-deployed-x7q9 '<payload>'
2. InvokeAction.Run calls a.resolveProtocol(ctx) FIRST, before any
URL is constructed (invoke.go:167).
3. resolveProtocol falls through to:
resolveAgentProtocol(ctx, azdClient, "echo-deployed-x7q9", ...)
(invoke.go:273).
4. resolveAgentProtocol delegates to resolveAgentService
(helpers.go:728).
5. resolveAgentService loops projectResponse.Project.Services and
matches by s.Name == name (helpers.go:562). No svc.Name equals
"echo-deployed-x7q9", so svc stays nil and the function returns:
"no azure.ai.agent service named 'echo-deployed-x7q9' found
in azure.yaml"
6. Error propagates back to Run. invocationsRemote / responsesRemote
are never called. The URL-path correctness 2.6.2 paid for never
fires.
The 2.6.2 G1 fix was a half-measure: the signature split gave the
resolver access to serviceName, but line 253 still passed agentName
to invokeCommandFor — handing the resolver a knife and choosing to
stab itself. The new TestResolveAfterShow_DivergentNames test in
2.6.2 only asserted the emitted *string*, not what happens when
that string is fed back into Run.
Fix (Option A — both halves are load-bearing together)
------------------------------------------------------
1. resolver.go — emit serviceName as the positional. The signature
reverts to ResolveAfterShow(state *State, serviceName string).
agentName is no longer needed by the resolver because:
- protocol lookup keys on serviceName via findService (already)
- the positional is now serviceName (this commit)
- the OpenAPI probe runs in show.go BEFORE the resolver and
populates state.OpenAPIPayload from the cache; the cache key
still uses agentName, but that's an internal contract owned
by show.go and invoke.go, not the resolver's API surface.
2. invoke.go — flip the translation gate in BOTH protocol-specific
remote functions:
invocationsRemote (invoke.go:663-665):
before: if name == "" && info.AgentName != ""
after: if info.AgentName != ""
responsesRemote (invoke.go:425-427):
before: if name == "" && info.AgentName != ""
after: if info.AgentName != ""
The flip is safe by construction:
- When user passes the SERVICE name positionally, the lookup at
helpers.go:560 succeeds, info.AgentName is populated, the
gate fires, name is translated to the deployed Foundry name,
and the URL is correct.
- When user passes the DEPLOYED Foundry name positionally
(legacy behavior; never reaches this path in practice today
because resolveProtocol fails first), the lookup at line 560
fails, err != nil, the entire if-block is skipped, the gate
is never reached, and behavior is unchanged.
- When names match (no divergence), translation is a no-op.
Opus's retrospective: the rejection reason cited in 2.6.2 ("would
change CLI semantics for users passing Foundry names positionally")
does not hold — those users hit the err != nil branch and never
reach the gate.
Why not the alternatives the reviewers also analyzed:
- Option B: add `--protocol <protocol>` to the suggestion. Skips
protocol resolution, but resolveAgentServiceFromProject still
fails silently inside the protocol-specific remote, leaving
agentEndpoint = "" — silent loss of session persistence. A
regression in disguise.
- Option C: fall back to AgentName search in resolveAgentService.
Wider blast radius. Affects `run` / `init` / `monitor` / files /
session — none of which this PR has any business touching.
End-to-end trace with the fix
-----------------------------
azure.yaml services: { echo }, AGENT_ECHO_NAME=echo-deployed-x7q9
Resolver emits: azd ai agent invoke echo '<payload>'
1. resolveProtocol → resolveAgentService("echo") matches svc.Name
→ protocol resolved ✓
2. invocationsRemote called with name = "echo"
3. resolveAgentServiceFromProject("echo") succeeds →
info.AgentName = "echo-deployed-x7q9", info.AgentEndpoint set
4. New gate fires → name = "echo-deployed-x7q9"
5. URL: …/agents/echo-deployed-x7q9/endpoint/protocols/… ✓
6. agentEndpoint != "" → session persistence active ✓
7. Cache key for fetchOpenAPISpec uses the post-translation name
(Foundry name), aligned with show.go's
WithOpenAPIProbe(agentName, "remote") read key ✓
Files changed (5, +55/-43)
--------------------------
M cli/azd/extensions/azure.ai.agents/internal/cmd/invoke.go
— flip gate at lines 425 and 664 (2 char-level changes)
M cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/resolver.go
— ResolveAfterShow signature: drop agentName parameter
— Active branch: invokeCommandFor receives serviceName
— invokeCommandFor: rename agentName→name in signature/doc;
behavior unchanged
— doc comment rewritten to explain that the resolver emits
service name end-to-end and invoke translates internally
M cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/resolver_test.go
— 11 call sites updated to drop second arg
— TestResolveAfterShow_DivergentNames Active subcase flipped:
was "echo-suffix-abc123", is now "svc-echo"
— test doc-comment rewritten
M cli/azd/extensions/azure.ai.agents/internal/cmd/show.go
— ResolveAfterShow call site drops agentName
— agentName field on ShowAction is retained: still used by
resolveNextStepFromSource for the OpenAPI probe
M cli/azd/extensions/azure.ai.agents/internal/cmd/show_test.go
— TestResolveNextStepFromSource_ActiveBranch_InvocationsProtocol
assertion flipped to service name
Pre-flight
----------
✓ gofmt -s -d clean
✓ go vet ./... clean
✓ go build ./... clean
✓ go test ./... full suite green (cmd 11.1s, nextstep 1.7s,
all other packages green)
✓ golangci-lint 0 issues
✓ cspell 0 issues
✓ live invoke smoke azd ai agent invoke hello-world-python-
invocations '{"message": "Hello!"}' against
the deployed sample — full SSE stream, agent
responded "Hello! How can I assist you today?"
(same-name path; divergent-name path locked
by unit tests)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* fix(azure.ai.agents): close symmetric G3 in invoke-success suggestion (G4)
Background
----------
Commit 211d1f334 (2.6.3) fixed the divergent-name path in one
direction: `azd ai agent show` now emits `azd ai agent invoke
<serviceName> ...` and the gate flip in invoke.go translates to
the deployed Foundry name internally. Three reviewers (Opus xhigh,
Sonnet 4.6, GPT-5.5) ran on 211d1f334.
GPT and Sonnet each flagged a LOW doc-comment issue (S4 — stale
`ShowAction.agentName` doc; deferred to a doc-cleanup commit).
Opus xhigh surfaced an unrelated MEDIUM finding (G4) that the
other two missed. On cross-pollination, Sonnet and GPT both
independently traced it and endorsed it at MEDIUM with the same
proposed fix. 3/3 consensus.
G4 — Medium, 3/3 consensus
--------------------------
2.6.3's gate flip translates the local variable `name` *in place*
from the azure.yaml service name to the deployed Foundry agent
name (the correct value for the URL path). But that post-translation
`name` is then passed to `emitInvokeSuccessNextStep(mode, name)`
in both protocol-specific remote functions, which feeds
`nextstep.ResolveAfterInvoke` → `resolveInvokeSuccess`. The
resolver embeds the value verbatim:
primary = fmt.Sprintf("azd ai agent show %s", agentName)
Trace of the broken case (azure.yaml: { echo }, env
AGENT_ECHO_NAME=echo-deployed-x7q9):
1. User runs the resolver's recommended
azd ai agent invoke echo '<payload>'
(correct after 2.6.3).
2. invocationsRemote: gate fires → name = "echo-deployed-x7q9".
3. HTTP call → URL is correct → SSE stream returns.
4. Post-success block: emitInvokeSuccessNextStep(mode,
"echo-deployed-x7q9") → resolver emits:
Next:
azd ai agent show echo-deployed-x7q9 (confirm health)
azd ai agent monitor --follow (stream logs)
5. User follows that first suggestion.
6. show.go:85 → resolveAgentServiceFromProject(
ctx, azdClient, "echo-deployed-x7q9", ...)
→ resolveAgentService → helpers.go:560-569 matches
s.Name == "echo-deployed-x7q9" against azure.yaml
→ no match → error:
"no azure.ai.agent service na…1 parent d6cf42b commit d478803
77 files changed
Lines changed: 22858 additions & 152 deletions
File tree
- cli/azd/extensions/azure.ai.agents
- internal
- cmd
- doctor
- nextstep
- pkg
- agents/agent_yaml
- envkey
- paths
- project
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
39 | 39 | | |
40 | 40 | | |
41 | 41 | | |
| 42 | + | |
42 | 43 | | |
43 | 44 | | |
44 | 45 | | |
| 46 | + | |
45 | 47 | | |
46 | 48 | | |
47 | 49 | | |
| |||
56 | 58 | | |
57 | 59 | | |
58 | 60 | | |
| 61 | + | |
59 | 62 | | |
60 | 63 | | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
| 33 | + | |
| 34 | + | |
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| |||
110 | 112 | | |
111 | 113 | | |
112 | 114 | | |
113 | | - | |
114 | 115 | | |
115 | 116 | | |
116 | 117 | | |
Lines changed: 279 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
| 210 | + | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
0 commit comments