Skip to content

Commit 4569c55

Browse files
Antriksh JainCopilot
andcommitted
feat(azure.ai.agents): pending-provision reasons foundation (4.11)
Introduces an extension-owned signal — `AI_AGENT_PENDING_PROVISION` — that lists resource-class tags `azd ai agent init` configured but Azure has not yet materialized. This is the architectural foundation for fixing a class of "trailer suggests azd deploy but provision is actually needed" bugs that surface when init flows create new resources inside an otherwise-existing Foundry project. This commit lands the plumbing only — no behavior change yet. Init code paths that mark resources for provisioning, and the resolver's consumption of the list, follow in subsequent commits (B = model deployment, C = project tag, D = ACR + AppInsights). Architecture AI_AGENT_PENDING_PROVISION is a comma-separated, sorted, deduplicated tag list. Empty/unset = nothing pending. Tag taxonomy is open — readers (the resolver, doctor) only check for non-emptiness, so new init sites can introduce new tags without coordinating with this package. Known tags today: - project (will replace 4.10 NeedsAIProjectProvision) - model_deployment (fixes the user-reported bug in commit B) - acr (covered in commit D) - app_insights (covered in commit D) Lifecycle is explicit, unlike 4.10's USE_EXISTING_AI_PROJECT signal derivation: init sites append tags as they decide; postprovision clears the list. The resolver and doctor read the snapshot. Components internal/cmd/pending_provision.go (NEW, ~170 LoC) Helpers for parsing/formatting/reading/writing the env var: - parsePendingProvisionReasons(value) - formatPendingProvisionReasons(reasons) - addPendingProvisionReason(ctx, client, env, reason) - removePendingProvisionReason(ctx, client, env, reason) - clearPendingProvisionReasons(ctx, client, env) - readPendingProvisionEnv(ctx, client, env) — NotFound-tolerant - mutatePendingProvisionReasons(...) — shared RMW core All write helpers are idempotent (skip the SetValue call when the formatted value equals the prior on-disk value). Best-effort parse normalization keeps the signal robust against hand edits. internal/cmd/pending_provision_test.go (NEW) Full coverage: parse edge cases, format normalization, add to empty / append / duplicate-noop, remove existing / non-existent / from-unset, clear, and a round-trip sequence verifying parse/format consistency end-to-end. Uses the existing testEnvironmentServiceServer fixture pattern. internal/cmd/listen.go (postprovisionHandler) After the toolbox-provision loop completes successfully, if any azure.ai.agent service was processed, fetch the current env name via Environment().GetCurrent and call clearPendingProvisionReasons. Best-effort: a transport failure is logged but not returned, since user's provision DID succeed and surfacing a clear-time error would be confusing. New helper currentEnvName() factors out the GetCurrent dance. internal/cmd/nextstep/state.go (assembleState) Reads AI_AGENT_PENDING_PROVISION alongside the existing USE_EXISTING_AI_PROJECT read, parses into the new State.PendingProvisionReasons field via a local parsePendingProvisionReasons copy (nextstep is a leaf package; cannot import cmd). Transport errors increment errCount but do not abort assembly — the field is best-effort and the resolver tolerates an empty list. internal/cmd/nextstep/types.go New field State.PendingProvisionReasons with a doc comment describing the signal contract and pointing back at pending_provision.go for the canonical helpers. internal/cmd/nextstep/resolver.go (ResolveAfterInit case 1) OR-in `len(state.PendingProvisionReasons) > 0` so any non-empty list fires the `azd provision` primary, regardless of whether NeedsAIProjectProvision is set or AZURE_AI_PROJECT_ENDPOINT is populated. Decision-tree doc comment updated. The legacy NeedsAIProjectProvision branch is retained for backwards compatibility — commit C will migrate init.go to write tags directly and the field will be removed at that point. Tests Existing 4.10 sub-cases for NeedsAIProjectProvision stay green. Three new state_test.go sub-cases lock the new field's parse contract end-to-end (unset/single/multiple/malformed); the transport-error tests bump errCount 3→4 and 4→5 to account for the extra env read. Two new resolver_test.go sub-cases pin the override behavior: HasProjectEndpoint=true + PendingProvisionReasons non-empty must still suggest `azd provision`, with both single-tag and multi-tag inputs. Pre-flight gofmt -s -w . clean go vet ./... clean go build ./... clean go test ./... green (cmd 13.9s, doctor 5.3s, nextstep 4.9s, agent_api 9.2s, etc.) golangci-lint run 0 issues cspell 0 issues on changed production files No behavior change yet — init.go, init_models.go, and init_foundry_resources_helpers.go still drive their existing signals. The resolver's case-1 condition is widened but no producer writes PendingProvisionReasons in this commit; the empty list short-circuits the OR, leaving identical behavior to 4.10. Refs Azure#7975 (PR Azure#8057 design spec) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent 874a516 commit 4569c55

8 files changed

Lines changed: 693 additions & 8 deletions

File tree

cli/azd/extensions/azure.ai.agents/internal/cmd/listen.go

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,10 +81,12 @@ func postprovisionHandler(
8181
azdClient *azdext.AzdClient,
8282
args *azdext.ProjectEventArgs,
8383
) error {
84+
hasAgent := false
8485
for _, svc := range args.Project.Services {
8586
if svc.Host != AiAgentHost {
8687
continue
8788
}
89+
hasAgent = true
8890

8991
if err := provisionToolboxes(ctx, azdClient, svc); err != nil {
9092
return fmt.Errorf(
@@ -94,9 +96,64 @@ func postprovisionHandler(
9496
}
9597
}
9698

99+
// Clear the AI_AGENT_PENDING_PROVISION signal now that provision has
100+
// finished successfully. Init writes resource-class tags into this
101+
// variable when it configures non-existent infra (a new model
102+
// deployment, a new Foundry project, a blank ACR/AppInsights input)
103+
// so the post-init trailer and `azd ai agent doctor` can recommend
104+
// `azd provision`. Once provision returns success the signal is
105+
// stale: subsequent runs of doctor/init/run/show/deploy should rely
106+
// on the canonical post-provision env vars (AZURE_AI_PROJECT_ENDPOINT
107+
// and friends) and the agent.yaml-vs-env diff. The clear is gated on
108+
// the presence of at least one azure.ai.agent service so toolbox-only
109+
// or non-agent provisions don't write to a variable they don't own.
110+
// Best-effort: a transport failure here is logged but not returned —
111+
// the user's provision DID succeed and surfacing a clear-time error
112+
// would be confusing. The next init/doctor run will simply re-emit
113+
// the suggestion until the variable is cleared by a future
114+
// successful provision (or by the user via `azd env set ... ""`).
115+
if hasAgent {
116+
envName, err := currentEnvName(ctx, azdClient)
117+
switch {
118+
case err != nil:
119+
log.Printf(
120+
"warning: failed to look up current environment to clear %s: %v",
121+
pendingProvisionEnvVar, err,
122+
)
123+
case envName == "":
124+
log.Printf(
125+
"warning: no current environment selected; skipping clear of %s",
126+
pendingProvisionEnvVar,
127+
)
128+
default:
129+
if clearErr := clearPendingProvisionReasons(ctx, azdClient, envName); clearErr != nil {
130+
log.Printf(
131+
"warning: failed to clear %s after provision: %v",
132+
pendingProvisionEnvVar, clearErr,
133+
)
134+
}
135+
}
136+
}
137+
97138
return nil
98139
}
99140

141+
// currentEnvName returns the name of the currently selected azd
142+
// environment, or empty string + error when no environment is
143+
// selected. Wraps Environment().GetCurrent so callers (notably
144+
// postprovisionHandler) can read the current env name without
145+
// duplicating the request shape.
146+
func currentEnvName(ctx context.Context, azdClient *azdext.AzdClient) (string, error) {
147+
resp, err := azdClient.Environment().GetCurrent(ctx, &azdext.EmptyRequest{})
148+
if err != nil {
149+
return "", err
150+
}
151+
if resp == nil || resp.Environment == nil {
152+
return "", nil
153+
}
154+
return resp.Environment.Name, nil
155+
}
156+
100157
func predeployHandler(ctx context.Context, azdClient *azdext.AzdClient, args *azdext.ProjectEventArgs) error {
101158
hasHostedAgent := false
102159
for _, svc := range args.Project.Services {

cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/resolver.go

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,8 @@ const (
4040
// are deploy-time landmines: the literal `{{NAME}}` would otherwise
4141
// land in the container. They never reach `azd env set` because the
4242
// value lives in agent.yaml itself, not the azd environment.
43-
// - NeedsAIProjectProvision OR !HasProjectEndpoint OR MissingInfraVars
44-
// → `azd provision`
43+
// - NeedsAIProjectProvision OR len(PendingProvisionReasons) > 0 OR
44+
// !HasProjectEndpoint OR MissingInfraVars → `azd provision`
4545
// The project endpoint is the canonical "provision finished"
4646
// marker — it is set by `azd provision` as a Bicep output, or by
4747
// `azd ai agent init` when the user selects an existing Foundry
@@ -59,6 +59,11 @@ const (
5959
// prior init or environment is stale and must not let the resolver
6060
// mistake the state for "ready to run or deploy". See
6161
// state.NeedsAIProjectProvision for the env-var contract.
62+
// PendingProvisionReasons generalizes the same idea to any
63+
// resource class — model deployments, ACR, App Insights, etc. —
64+
// so this branch fires whenever init recorded *any* tag the
65+
// postprovision handler has not yet cleared. See
66+
// state.PendingProvisionReasons for the env-var contract.
6267
// - MissingManualVars → one `azd env set <KEY> <value>` per missing var
6368
// (up to maxFixupLines)
6469
// - Otherwise → `azd ai agent run`
@@ -94,7 +99,10 @@ func ResolveAfterInit(state *State) []Suggestion {
9499
}
95100

96101
switch {
97-
case state.NeedsAIProjectProvision || !state.HasProjectEndpoint || len(state.MissingInfraVars) > 0:
102+
case state.NeedsAIProjectProvision ||
103+
len(state.PendingProvisionReasons) > 0 ||
104+
!state.HasProjectEndpoint ||
105+
len(state.MissingInfraVars) > 0:
98106
out = append(out, Suggestion{
99107
Command: "azd provision",
100108
Description: "set up your Foundry project, models, and connections",

cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/resolver_test.go

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,6 +91,35 @@ func TestResolveAfterInit(t *testing.T) {
9191
wantPrimaryHas: "azd ai agent run",
9292
wantTrailing: "azd deploy",
9393
},
94+
{
95+
// Init configured a new model deployment in an existing
96+
// Foundry project: HasProjectEndpoint=true (existing
97+
// project), NeedsAIProjectProvision=false (existing
98+
// project), but PendingProvisionReasons contains
99+
// "model_deployment". The resolver must still suggest
100+
// `azd provision` so Bicep creates the new deployment.
101+
name: "new model deployment in existing project → provision (PendingProvisionReasons override)",
102+
state: &State{
103+
HasProjectEndpoint: true,
104+
NeedsAIProjectProvision: false,
105+
PendingProvisionReasons: []string{"model_deployment"},
106+
},
107+
wantPrimaryHas: "azd provision",
108+
wantTrailing: "azd deploy",
109+
},
110+
{
111+
// Multiple pending reasons collected during init —
112+
// e.g. user left ACR blank and configured a new model.
113+
// Still single `azd provision` suggestion (resolver
114+
// treats the list as opaque non-emptiness).
115+
name: "multiple pending reasons → single provision suggestion",
116+
state: &State{
117+
HasProjectEndpoint: true,
118+
PendingProvisionReasons: []string{"acr", "model_deployment"},
119+
},
120+
wantPrimaryHas: "azd provision",
121+
wantTrailing: "azd deploy",
122+
},
94123
}
95124

96125
for _, tt := range tests {

cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/state.go

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,16 @@ const (
4646
// USE_EXISTING_AI_PROJECT in CHANGELOG.md entry for PR #7843.
4747
useExistingAIProjectVar = "USE_EXISTING_AI_PROJECT"
4848

49+
// pendingProvisionVar names the extension-owned env var that
50+
// lists resource-class tags init configured but provision has
51+
// not yet materialized. See State.PendingProvisionReasons for
52+
// the full semantics and pending_provision.go in the cmd package
53+
// for the read/write helpers and the reason-tag taxonomy. The
54+
// constant is duplicated here (rather than imported from cmd)
55+
// because nextstep is a leaf package with no dependency on cmd
56+
// — both packages share the same string literal contract.
57+
pendingProvisionVar = "AI_AGENT_PENDING_PROVISION"
58+
4959
// azureInfraPrefix tags an env-var name as an azd-infra output rather
5060
// than a user-supplied manual variable. Outputs of `azd provision`
5161
// in the AI Foundry templates uniformly start with this prefix
@@ -243,6 +253,23 @@ func assembleState(ctx context.Context, src Source, opts ...Option) (*State, []e
243253
errs = append(errs, fmt.Errorf("read %s: %w", useExistingAIProjectVar, err))
244254
}
245255
state.NeedsAIProjectProvision = useExisting == "false"
256+
257+
// PendingProvisionReasons is the generalized "init configured
258+
// something provision still has to materialize" signal that
259+
// the model-deployment / ACR / App-Insights blank-input
260+
// branches write into. Read here so the resolver and doctor
261+
// share one snapshot. Unknown tags are kept verbatim — the
262+
// resolver only checks for non-emptiness, and downstream
263+
// readers may interpret tags they recognize. Transport
264+
// errors are surfaced into errs but do not abort assembly;
265+
// the field is best-effort and the resolver tolerates an
266+
// empty list (it falls back to legacy heuristics in that
267+
// case).
268+
pending, err := src.EnvValue(ctx, envName, pendingProvisionVar)
269+
if err != nil {
270+
errs = append(errs, fmt.Errorf("read %s: %w", pendingProvisionVar, err))
271+
}
272+
state.PendingProvisionReasons = parsePendingProvisionReasons(pending)
246273
}
247274

248275
project, err := src.Project(ctx)
@@ -526,3 +553,33 @@ func serviceKey(name string) string {
526553
k = strings.ReplaceAll(k, "-", "_")
527554
return strings.ToUpper(k)
528555
}
556+
557+
// parsePendingProvisionReasons splits the AI_AGENT_PENDING_PROVISION
558+
// env-var value into a sorted, deduplicated, whitespace-trimmed list of
559+
// reason tags. Empty input or input containing only separators returns
560+
// nil. Malformed input is best-effort normalized — the env var is a
561+
// hint signal and parse trouble should not abort state assembly. This
562+
// helper mirrors cmd.parsePendingProvisionReasons; the duplication is
563+
// intentional to keep nextstep a leaf package with no dependency on cmd.
564+
func parsePendingProvisionReasons(value string) []string {
565+
if strings.TrimSpace(value) == "" {
566+
return nil
567+
}
568+
seen := make(map[string]struct{})
569+
for _, raw := range strings.Split(value, ",") {
570+
tag := strings.TrimSpace(raw)
571+
if tag == "" {
572+
continue
573+
}
574+
seen[tag] = struct{}{}
575+
}
576+
if len(seen) == 0 {
577+
return nil
578+
}
579+
out := make([]string, 0, len(seen))
580+
for tag := range seen {
581+
out = append(out, tag)
582+
}
583+
slices.Sort(out)
584+
return out
585+
}

cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/state_test.go

Lines changed: 59 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -156,10 +156,11 @@ func TestAssembleState(t *testing.T) {
156156
assert.False(t, state.Services[0].IsDeployed)
157157
assert.False(t, state.HasProjectEndpoint)
158158
assert.False(t, state.NeedsAIProjectProvision)
159+
assert.Empty(t, state.PendingProvisionReasons)
159160
},
160-
// One error for AZURE_AI_PROJECT_ENDPOINT + one for USE_EXISTING_AI_PROJECT
161-
// + one per service lookup (AGENT_ECHO_VERSION) = 3.
162-
errCount: 3,
161+
// One error each for AZURE_AI_PROJECT_ENDPOINT, USE_EXISTING_AI_PROJECT,
162+
// AI_AGENT_PENDING_PROVISION + one per service lookup (AGENT_ECHO_VERSION) = 4.
163+
errCount: 4,
163164
},
164165
{
165166
name: "USE_EXISTING_AI_PROJECT unset: NeedsAIProjectProvision stays false",
@@ -223,6 +224,59 @@ func TestAssembleState(t *testing.T) {
223224
assert.False(t, state.NeedsAIProjectProvision)
224225
},
225226
},
227+
{
228+
name: "AI_AGENT_PENDING_PROVISION unset: PendingProvisionReasons stays empty",
229+
src: &fakeSource{
230+
envName: "dev",
231+
project: &azdext.ProjectConfig{Name: "demo"},
232+
values: map[string]string{"dev/AZURE_AI_PROJECT_ENDPOINT": "https://x.services.ai.azure.com"},
233+
},
234+
assert: func(t *testing.T, state *State, _ []error) {
235+
assert.Empty(t, state.PendingProvisionReasons)
236+
},
237+
},
238+
{
239+
name: "AI_AGENT_PENDING_PROVISION single tag: PendingProvisionReasons populated",
240+
src: &fakeSource{
241+
envName: "dev",
242+
project: &azdext.ProjectConfig{Name: "demo"},
243+
values: map[string]string{
244+
"dev/AZURE_AI_PROJECT_ENDPOINT": "https://x.services.ai.azure.com",
245+
"dev/AI_AGENT_PENDING_PROVISION": "model_deployment",
246+
},
247+
},
248+
assert: func(t *testing.T, state *State, _ []error) {
249+
assert.Equal(t, []string{"model_deployment"}, state.PendingProvisionReasons)
250+
},
251+
},
252+
{
253+
name: "AI_AGENT_PENDING_PROVISION multiple tags: parsed sorted dedup",
254+
src: &fakeSource{
255+
envName: "dev",
256+
project: &azdext.ProjectConfig{Name: "demo"},
257+
values: map[string]string{
258+
"dev/AZURE_AI_PROJECT_ENDPOINT": "https://x.services.ai.azure.com",
259+
"dev/AI_AGENT_PENDING_PROVISION": "project,acr,project,model_deployment",
260+
},
261+
},
262+
assert: func(t *testing.T, state *State, _ []error) {
263+
assert.Equal(t, []string{"acr", "model_deployment", "project"}, state.PendingProvisionReasons)
264+
},
265+
},
266+
{
267+
name: "AI_AGENT_PENDING_PROVISION malformed value: best-effort normalize",
268+
src: &fakeSource{
269+
envName: "dev",
270+
project: &azdext.ProjectConfig{Name: "demo"},
271+
values: map[string]string{
272+
"dev/AZURE_AI_PROJECT_ENDPOINT": "https://x.services.ai.azure.com",
273+
"dev/AI_AGENT_PENDING_PROVISION": " ,, project ,, acr , ",
274+
},
275+
},
276+
assert: func(t *testing.T, state *State, _ []error) {
277+
assert.Equal(t, []string{"acr", "project"}, state.PendingProvisionReasons)
278+
},
279+
},
226280
}
227281

228282
for _, tt := range tests {
@@ -899,8 +953,8 @@ environment_variables:
899953

900954
state, errs := assembleState(context.Background(), src)
901955
// One error each for AZURE_AI_PROJECT_ENDPOINT + USE_EXISTING_AI_PROJECT
902-
// + AGENT_ECHO_VERSION + MY_API_KEY.
903-
assert.Len(t, errs, 4)
956+
// + AI_AGENT_PENDING_PROVISION + AGENT_ECHO_VERSION + MY_API_KEY = 5.
957+
assert.Len(t, errs, 5)
904958
assert.Empty(t, state.MissingInfraVars)
905959
assert.Empty(t, state.MissingManualVars)
906960
}

cli/azd/extensions/azure.ai.agents/internal/cmd/nextstep/types.go

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,8 +80,34 @@ type State struct {
8080
// endpoint check independently passes. The flag is false when the
8181
// variable is unset (no prior init) or "true" (existing path) so
8282
// the existing heuristic continues to drive those cases.
83+
//
84+
// NOTE: Slated for removal in a follow-up commit (commit C) once
85+
// init.go is migrated to call addPendingProvisionReason("project")
86+
// directly. The replacement signal is PendingProvisionReasons
87+
// below; both fields are read by the resolver in the interim so
88+
// the migration can land in small, independently reviewable steps.
8389
NeedsAIProjectProvision bool
8490

91+
// PendingProvisionReasons lists the resource-class tags that
92+
// `azd ai agent init` configured but Azure has not yet
93+
// materialized. Init code paths append a tag — e.g.
94+
// "model_deployment" when a new model deployment is configured in
95+
// an existing project, "project" when a new Foundry project is
96+
// selected, "acr"/"app_insights" when the user leaves those
97+
// inputs blank — and the postprovisionHandler clears the list on
98+
// successful provision. The resolver fires `azd provision`
99+
// whenever the list is non-empty; doctor can surface the specific
100+
// reasons for richer diagnostics.
101+
//
102+
// The signal is stored in the AI_AGENT_PENDING_PROVISION env var
103+
// (extension-owned namespace, not AZURE_*) as a comma-separated,
104+
// sorted, deduplicated string. Unknown tags are tolerated by the
105+
// resolver for forward-compatibility, so new init sites can
106+
// introduce new tags without coordinating with this package. See
107+
// pending_provision.go for the read/write helpers and the
108+
// reason-tag taxonomy.
109+
PendingProvisionReasons []string
110+
85111
// MissingInfraVars names ${...} references in agent.yaml that map to
86112
// Bicep outputs not yet present in the azd environment (i.e.,
87113
// provision is needed or has been skipped). Named so the resolver can

0 commit comments

Comments
 (0)