Skip to content

Commit b429308

Browse files
authored
feat(routines): add azd ai routine commands (#8241)
* feat(routines): implement azd ai routine commands Add the full v1 routine command subtree to the azure.ai.routines extension as specified in the design spec (PR #8200). Commands implemented: - routine create, update, show, list, delete - routine enable, disable (dedicated idempotent action routes) - routine dispatch (calls dispatch_async, --async flag for client-side wait) - routine run list (auto-paging, --top, --filter) New packages: - internal/exterrors/ -- structured error codes and helpers - internal/pkg/routines/ -- data-plane HTTP client and models - internal/cmd/endpoint.go -- 5-level project endpoint resolver Wire format: trigger/action as Record with 'default' key. All calls include x-ms-foundry-features-opt-in: Routines=V1Preview header. Also adds the design spec at cli/azd/docs/design/ai-routine-design-spec.md. * test(routines): add unit tests for endpoint, create, manifest, and models - Add readAzdProjectSourcesFunc seam to endpoint.go for daemon isolation - endpoint_test.go: isFoundryHost, validateProjectEndpoint, full cascade tests - routine_create_test.go: buildTrigger and buildAction table tests - routine_manifest_test.go: readRoutineManifest (JSON/YAML), mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction - models_test.go: TriggerCLIToWire and ActionCLIToWire completeness - Add yaml struct tags to models.go for YAML manifest support * test(routines): align test patterns with azure.ai.agents extension - Extract stubAzdProjectSources() helper (mirrors stubAzdHostedSources in agents) - isolateFromAzdDaemon now also clears AZD_SERVER env var - Add t.Parallel() to all pure-function tests (isFoundryHost, validateProjectEndpoint, buildTrigger, buildAction, mergeRoutineFromFile, applyUpdateFlags, getTrigger/getAction, TriggerCLIToWire/ActionCLIToWire map checks) * fix(routines): address CI lint and spell-check failures - cspell.yaml: add exterrors, sess, routineName, azdProjectSources to word list - endpoint.go: remove unused projectEndpointPathPrefix constant - routine_create.go: wrap long buildAction() call (line >125 chars) - routine_update.go: wrap long --file flag help text - routine_manifest_test.go: expand inline map literals to multi-line - client.go: wrap ListRoutineRuns signature to fit 125-char limit - Run gofmt -w and go fix on all files (codes.go, client.go, models.go formatting) * fix(routines): remove unused ptrBool helper golangci-lint flagged ptrBool as unused. The function had no call sites; the //go:fix inline directive does not exempt it from the unused linter. * docs(routines): remove design spec from PR The design spec is tracked separately in PR #8200; this PR focuses on the implementation only. * fix(routines): close response bodies per page and preserve filter on pagination - Extract getPage helper so resp.Body.Close runs per iteration in ListRoutines and ListRoutineRuns (defer-in-loop leaked FDs). - Preserve the filter query param across pages in ListRoutineRuns; previously page 2+ only carried pageToken and dropped the filter. - Correct dispatch command help text: the service always runs routines asynchronously, so the old 'waits and streams' wording was wrong. * fix(routines): flatten command tree to remove duplicate 'routine routine' The extension namespace 'ai.routine' already mounts the extension under 'azd ai routine'. Adding a 'routine' subcommand group on top of that produced the redundant 'azd ai routine routine <cmd>' path. Move --project-endpoint persistent flag and all subcommands directly onto rootCmd so the correct usage is 'azd ai routine <cmd>'. * fix(routines): align data-plane client with Foundry Routines TypeSpec The first cut of the Routines client used header/route/field shapes that did not match the TypeSpec being merged in azure-rest-api-specs#42779. Realign the extension with the spec so requests round-trip cleanly: * Preview header renamed from 'x-ms-foundry-features-opt-in' to 'Foundry-Features' (the value 'Routines=V1Preview' was already correct). * Async dispatch route renamed ':dispatch_async' -> ':dispatchAsync'; the action segment is case-sensitive per spec. * Dropped the non-existent ':enable' / ':disable' action routes; enable/disable now read the routine and PUT it back with 'enabled' flipped (idempotent: no-op if already at the target value). * DispatchRoutineRequest wraps a discriminated 'payload' object whose 'type' must match the routine's action type; --conversation-id was removed from dispatch (the spec does not expose it). * Routine.Action is now a single discriminated object (not an 'actions' map keyed by name). * RoutineAction.AgentName -> AgentID; the CLI flag is renamed to --agent-id accordingly. * RoutineTrigger.Cron -> CronExpression to match the TypeSpec field. * PagedRoutine pagination follows the absolute 'nextLink' URL from Azure.Core.Page<Routine> instead of re-deriving a continuation query. * RoutineRun gains the additional fields documented in the spec (phase, trigger_type, attempt_source, action_type, triggered_at, dispatch_id, action_correlation_id, response_id, error_type, error_message); 'run list' now prints phase alongside status. * EventRoutineTrigger fields aligned to the spec: connection_id, owner, repository, actions[]; removed 'assignee'. * DispatchRoutineResponse drops the unused 'status' field. Tests, mock manifests, and the E2E driver were updated to the new contract (--agent-id, agent_id, cron_expression, single 'action'). Note: the live Foundry Routines preview endpoint still returns HTTP 500 on /routines?api-version=v1 even with the correct request shape; that is an upstream service bug tracked separately. * fix(routines): clarify comment for github_issue fields in RoutineTrigger * fix(routines): address PR review feedback - endpoint: reject project endpoints with an explicit port so the normalized URL cannot silently strip a non-default port - routine create: only set Enabled from --enabled when the user explicitly passes the flag, so a manifest's enabled value is honored; default to enabled=true if neither source provides one - routine create: explicitly reject --trigger github-issue (deferred for v1) instead of producing an incomplete github_issue trigger - routine_helpers: boolStr now returns "unknown" for a nil pointer to avoid displaying "true" when the field is absent from the service response - routine_manifest: surface applyUpdateFlags user-input errors as exterrors.Validation (CodeInvalidParameter) for consistent CLI error shapes * chore(routines): add .golangci.yaml and AGENTS.md to align with sibling extensions Other AI extensions (projects, agents, toolboxes, inspector) ship a .golangci.yaml lint config and an AGENTS.md contributor guide. Add both to azure.ai.routines so it follows the same convention, and register the project-specific �xterrors word in cspell.yaml. * fix(routines): use camelCase JSON tags to match Foundry service wire The deployed Foundry Routines data plane applies a camelCase property naming policy on the wire (e.g. `cronExpression`, `timeZone`, `agentId`), even though the upstream TypeSpec / OpenAPI document still emits snake_case. With snake_case JSON tags, `routine create` and `update` always failed with errors like: triggers['default'].cronExpression must be provided for schedule routines exactly one of action.agentId or action.agentEndpointId must be provided and routines read back from `show` / `list` would have empty trigger/action fields because the camelCase wire payload did not deserialize into snake_case-tagged Go fields. Switch the JSON tags on `Routine`, `RoutineTrigger`, `RoutineAction`, `RoutineRun`, `PagedRoutineRun`, and `DispatchRoutineResponse` to camelCase so requests/responses round-trip cleanly against the deployed service. YAML tags stay snake_case so user-facing `--file` manifests keep the documented convention. Verified against a live project endpoint: create/list/show now reach the service correctly (residual `InternalServerError` from the backend is unrelated and reproduces from raw curl with the same body). * feat(routines): align with spec PR #43186 and fix HTTP/2 hang Use azure-rest-api-specs PR #43186 (Foundry Routines TypeSpec) as the single source of truth for the routines extension, applying every spec change that does not break the currently deployed service, and documenting each deliberate divergence inline and in AGENTS.md. ## Spec alignment * `RoutineRun` and `DispatchRoutineResponse` gain the new `TaskID` field (wire `taskId`); the service already emits it. `dispatch` now prints `Task ID` after `Action Correlation ID`, and JSON output exposes the new field too. * `RoutineTrigger` is restructured to match the spec's `GitHubIssueOpenedRoutineTrigger` shape: dropped `Owner` / `Actions[]`, added `Assignee`. The github trigger is still deferred at the CLI surface, so this is safe. * Inline comments and a new AGENTS.md table call out each divergence the client deliberately keeps to stay compatible with the live service: camelCase wire naming (spec is snake_case), `agentId` field (spec renamed to `agent_name`), `:dispatchAsync` action segment (spec uses `:dispatch_async`), GET+PUT enable/disable fallback (spec adds dedicated routes which still 404), `value`/`nextLink` / `value`/`nextPageToken` paged shapes (spec uses `AgentsPagedResult<T>`), and `github_issue` wire value (spec renamed to `github_issue_opened`). ## CLI bug fix: HTTP/2 stream-reset hang The pipeline now uses a custom `http.Client` with an explicit `ResponseHeaderTimeout` (60s) and `TryTimeout` (30s), and azcore retries are capped at 1. When the Foundry service returns an HTTP/2 RST_STREAM (for example, the schedule-create InternalServerError), the CLI now surfaces a `context deadline exceeded` error within ~40 seconds instead of the previous ~6 minute hang. ## Verified end-to-end against a live Foundry project * timer create / show / list / update / disable / enable / dispatch (with `taskId` round-tripping) / run list / delete all succeed. * schedule create still fails (service-side ISE) but now in under a minute instead of six. * feat(routines): defer recurring/schedule trigger until service is ready The Foundry data plane currently returns `InternalServerError` for any `PUT /routines/{name}` request whose trigger is `schedule` (the wire value behind the CLI's `--trigger recurring`). The CLI side is fully implemented and verified correct via raw curl, so this is a service-side issue, but it leaves the `recurring` trigger non-functional end-to-end. Take the recurring trigger off the public CLI surface so users do not hit the service hang: * Drop the `--cron` flag from `routine create` and `routine update`. * `--trigger recurring` is now rejected with the same "deferred" shape as `--trigger github-issue`: a clear error pointing the user at `--trigger timer` and explaining that recurring is gated on the Foundry service. * `--trigger` help text and validation messages list only `timer`. The underlying wire model still carries `cron_expression` / `time_zone` and the `schedule` discriminator so re-enabling the trigger when the service is ready is just a CLI flag-wiring change. Unit tests around buildTrigger and applyUpdateFlags are updated accordingly. * fix(routines): enforce update-mode manifest merge, env-backed no-prompt in delete, and action-type flag validation * fix(routines): replace unknown word 'misroute' in endpoint comment * feat(routines): add exterrors unit tests and .gitignore for bin/ * style(routines): fix gofmt formatting in test files * feat(routines): align client with spec PR #43186 routes and fields The Foundry data plane now honors the routine spec from azure-rest-api-specs#43186. Switch the client off the workarounds that the first cut needed and onto the spec-shaped routes and wire format. Tested by probing the live data plane on a Foundry project endpoint: * Wire field naming: switch from camelCase to snake_case across Routine, RoutineTrigger, RoutineAction, RoutineRun, and DispatchRoutineResponse. Confirmed: service now rejects `agentId` / `agentName` (camel) with a 400 `exactly one of agent_name or agent_endpoint_id must be provided` and only accepts `agent_name`. * Enable / disable: switch from GET+PUT-with-enabled-flipped to the spec routes `POST /routines/{name}:enable` and `POST /routines/{name}:disable`. Confirmed: both routes return `UserError` / `NotFoundError` for missing routines (route exists; resource doesn't), instead of the empty 404 the routes used to return. * Async dispatch: switch from `:dispatchAsync` (camelCase) to the spec route `:dispatch_async` (snake). Confirmed: the snake route is live; the camel form now returns an empty 404 (route gone). * Schedule trigger: re-enable `--trigger recurring` / `--cron`. The original deferral was because every `schedule` PUT 500'd; with the spec wire format the schedule trigger passes service-side validation just like `timer` does. Re-add the `--cron` flag on `create` and `update`. Kept divergent because the service has not caught up yet: * `github_issue_opened` trigger value -- service still rejects it with `unrecognized type discriminator id`; CLI does not expose the github trigger yet, so the wire mapping keeps `github_issue`. * `AgentsPagedResult<T>` envelope -- service still returns `value` + `nextLink` (routines) / `value` + `nextPageToken` (runs) rather than the spec's `data` / `last_id` / `has_more`. Also: * CLI flag `--agent-id` renamed to `--agent-name` to match the spec field name. Go field `RoutineAction.AgentID` renamed to `AgentName`. * Drop now-stale `spec divergence` comments from the client, models, and AGENTS.md alignment table. * fix(routines): address review feedback — op code, gRPC cancel, manifest errors, logging, docs - routine_dispatch.go: use OpDispatchRoutine (not OpGetRoutine) when the inner GetRoutine call fails during dispatch validation - exterrors/errors.go: IsCancellation now checks gRPC codes.Canceled in addition to context.Canceled, matching the agents implementation - routine_manifest.go: distinguish os.IsNotExist from other os.ReadFile errors so permission-denied / is-a-directory get accurate messages - client.go: add Logging.AllowedHeaders for MsCorrelationIdHeader to match agents observability parity - AGENTS.md: rewrite exterrors section in present tense (package exists)
1 parent 6f2867c commit b429308

28 files changed

Lines changed: 3528 additions & 2 deletions
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
bin/
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
version: "2"
2+
3+
linters:
4+
default: none
5+
enable:
6+
- gosec
7+
- lll
8+
- unused
9+
- errorlint
10+
settings:
11+
lll:
12+
line-length: 220
13+
tab-width: 4
14+
15+
formatters:
16+
enable:
17+
- gofmt
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
# Azure AI Routines Extension - Agent Instructions
2+
3+
Use this file together with `cli/azd/AGENTS.md`. This guide supplements the root azd
4+
instructions with the conventions that are specific to this extension.
5+
6+
## Overview
7+
8+
`azure.ai.routines` is a first-party azd extension under
9+
`cli/azd/extensions/azure.ai.routines/`. It runs as a separate Go binary and talks
10+
to the azd host over gRPC.
11+
12+
The user-facing surface is `azd ai routine <verb>` — CRUD over Microsoft Foundry
13+
Routines attached to a Foundry project endpoint.
14+
15+
Useful places to start:
16+
17+
- `internal/cmd/`: Cobra commands and verb implementations
18+
- Project-endpoint resolution comes from the sibling `azure.ai.projects`
19+
extension (and the shared cascade); do not re-implement it here.
20+
21+
## Build and test
22+
23+
From `cli/azd/extensions/azure.ai.routines`:
24+
25+
```bash
26+
# Build using developer extension (for local development)
27+
azd x build
28+
29+
# Or build using Go directly
30+
go build
31+
32+
# Run unit tests
33+
go test ./... -count=1
34+
```
35+
36+
If extension work depends on a new azd core change, plan for two PRs:
37+
38+
1. Land the core change in `cli/azd` first.
39+
2. Land the extension change after that, updating this module to the newer azd
40+
dependency with `go get github.com/azure/azure-dev/cli/azd && go mod tidy`.
41+
42+
For local development, draft work, or validating both sides together before the
43+
core PR is merged, you may temporarily add:
44+
45+
```go
46+
replace github.com/azure/azure-dev/cli/azd => ../../
47+
```
48+
49+
That `replace` points this extension at your local `cli/azd` checkout instead of
50+
the version in `go.mod`. Do not merge the extension with that `replace` still
51+
present.
52+
53+
## Error handling
54+
55+
Return plain Go errors by default, and wrap lower-level failures with
56+
`fmt.Errorf("context: %w", err)` where useful.
57+
58+
This extension uses an `internal/exterrors` package (modeled on `azure.ai.agents` /
59+
`azure.ai.toolboxes`) for stable telemetry categories, error codes, and
60+
user-facing suggestions:
61+
62+
- Create a structured error once, as close as possible to the place where you
63+
know the final category, code, and suggestion.
64+
- If `err` is already a structured error, return it unchanged. Do **not** wrap
65+
it with `fmt.Errorf("context: %w", err)` — during gRPC serialization, azd
66+
preserves the structured error's own message/code/category, not the outer
67+
wrapper text.
68+
- Prefer the dedicated helpers at the Azure/gRPC boundary:
69+
- `exterrors.ServiceFromAzure(err, operation)` for `azcore.ResponseError`
70+
(data-plane and ARM calls).
71+
- `exterrors.FromPrompt(err, contextMessage)` for `azdClient.Prompt().*`
72+
failures.
73+
74+
## Release preparation
75+
76+
A new extension release ships in two PRs:
77+
78+
### PR 1 — Version bump
79+
80+
Bumps the extension to the new version. Touches only:
81+
82+
- `version.txt` — new semver string
83+
- `extension.yaml``version:` field
84+
- `CHANGELOG.md` — new release section at the top
85+
86+
Once merged, the team triggers the CI release pipeline, which builds, signs, and
87+
publishes the extension binaries as a GitHub release.
88+
89+
### PR 2 — Registry update
90+
91+
After the GitHub release is live, a follow-up PR updates
92+
`cli/azd/extensions/registry.json` so azd users can install the new version.
93+
The contents of that file are produced by running `azd x publish` against the
94+
published release artifacts (which computes the artifact URLs and checksums).
95+
The resulting PR should contain only the regenerated `registry.json` entry for
96+
the extension, and in some cases updated test snapshots as well.
97+
98+
## Output: `log` vs `fmt`
99+
100+
Extensions write directly to stdout/stderr — there is no `Console` abstraction
101+
from azd core.
102+
103+
- **`fmt.Print*`** — user-facing output (stdout). Pair with `output.With*Format`
104+
helpers for styled text.
105+
- **`log.Print*`** — developer diagnostics (stderr). Hidden unless `--debug`
106+
is set. Never use `log` for anything the user needs to see.
107+
- Do not use `log.Fatal` or `log.Panic` for expected failures — return an error
108+
instead.
109+
110+
```go
111+
// ✅ log — internal detail the user doesn't need to see
112+
log.Printf("routine show: pending-routine read failed for %q: %v", name, err)
113+
114+
// ✅ fmt — user-facing status and results
115+
fmt.Printf("Created routine %s at version %s.\n", name, version)
116+
117+
// ❌ fmt used for debug noise — user sees internal details they can't act on
118+
fmt.Printf("Parsed endpoint: host=%s, path=%s\n", host, path) // use log.Printf
119+
120+
// ❌ log used for user-facing info — user never sees it without --debug
121+
log.Printf("No routines found on project") // use fmt.Print*
122+
```
123+
124+
## Other extension conventions
125+
126+
- Use modern Go 1.26 patterns where they help readability.
127+
- Reserved azd globals (`--output`, `--no-prompt`) are inherited from `extCtx`,
128+
not registered as flags on each verb.
129+
- Lowercase-normalize `--output` when reading it from `extCtx` so downstream
130+
branches can compare with `== "json"`.
131+
- When using `PromptSubscription()`, create credentials with
132+
`Subscription.UserTenantId`, not `Subscription.TenantId`.
133+
134+
## API spec alignment
135+
136+
The authoritative TypeSpec is in
137+
[`azure-rest-api-specs` PR #43186](https://github.com/Azure/azure-rest-api-specs/pull/43186)
138+
(`specification/ai-foundry/data-plane/Foundry/src/routines/`). The client in
139+
`internal/pkg/routines/` tracks that spec, with a small number of remaining
140+
divergences kept for compatibility with the currently deployed Foundry service:
141+
142+
| Concern | Spec | Live service | Client choice |
143+
|---|---|---|---|
144+
| `github_issue_opened` trigger | renamed in spec | still accepts only `github_issue` | keep `github_issue` wire value (CLI surface is deferred) |
145+
| `AgentsPagedResult<T>` envelope | `data` + `last_id` + `has_more` | `value` + `nextLink` (routines) / `value` + `nextPageToken` (runs) | match service shape |
146+
Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,6 @@
11
import: ../../.vscode/cspell.yaml
2-
words: []
2+
words:
3+
- exterrors
4+
- sess
5+
- routineName
6+
- azdProjectSources

cli/azd/extensions/azure.ai.routines/go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
11
module azure.ai.routines
22

3-
43
go 1.26.1
54

65
require (
@@ -15,6 +14,7 @@ require (
1514
require (
1615
github.com/AlecAivazis/survey/v2 v2.3.7 // indirect
1716
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.2 // indirect
17+
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/appservice/armappservice/v2 v2.3.0 // indirect
1818
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/keyvault/armkeyvault v1.5.0 // indirect
1919
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/resources/armsubscriptions v1.3.0 // indirect
2020
github.com/Azure/azure-sdk-for-go/sdk/security/keyvault/azsecrets v1.4.0 // indirect

cli/azd/extensions/azure.ai.routines/go.sum

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ github.com/Azure/azure-sdk-for-go/sdk/azidentity/cache v0.3.2 h1:yz1bePFlP5Vws5+
99
github.com/Azure/azure-sdk-for-go/sdk/azidentity/cache v0.3.2/go.mod h1:Pa9ZNPuoNu/GztvBSKk9J1cDJW6vk/n0zLtV4mgd8N8=
1010
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.2 h1:9iefClla7iYpfYWdzPCRDozdmndjTm8DXdpCzPajMgA=
1111
github.com/Azure/azure-sdk-for-go/sdk/internal v1.11.2/go.mod h1:XtLgD3ZD34DAaVIIAyG3objl5DynM3CQ/vMcbBNJZGI=
12+
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/appservice/armappservice/v2 v2.3.0 h1:JI8PcWOImyvIUEZ0Bbmfe05FOlWkMi2KhjG+cAKaUms=
13+
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/appservice/armappservice/v2 v2.3.0/go.mod h1:nJLFPGJkyKfDDyJiPuHIXsCi/gpJkm07EvRgiX7SGlI=
1214
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/internal/v2 v2.0.0 h1:PTFGRSlMKCQelWwxUyYVEUqseBJVemLyqWJjvMyt0do=
1315
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/internal/v2 v2.0.0/go.mod h1:LRr2FzBTQlONPPa5HREE5+RjSCTXl7BwOvYOaWTqCaI=
1416
github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/internal/v3 v3.1.0 h1:2qsIIvxVT+uE6yrNldntJKlLRgxGbZ85kgtz5SNBhMw=

0 commit comments

Comments
 (0)