feat: use v2 deployment read endpoints by anneschuth · Pull Request #23 · RijksICTGilde/zad-cli

anneschuth · 2026-05-07T09:28:22Z

Summary

Upstream RIG-Cluster shipped GET /v2/projects/{p}/deployments and GET /v2/projects/{p}/deployments/{d} (closes the structural problem flagged in RijksICTGilde/RIG-Cluster#51). The CLI no longer reconstructs deployment state from /logs + /tasks; it just reads it.

This PR also picks up the api/upstream-openapi.json bump from the daily sync workflow.

Client

New: ZadClient.list_deployments_v2(project) and get_deployment_v2(project, deployment). Responses are validated through pydantic; schema drift surfaces as ZadApiError(502) so the existing error-render path stays clean.
list_deployments and describe_deployment now wrap the v2 endpoints directly. Same public signatures, richer return shape (status, sync_revision, last_synced_at, errors).
resolve_namespace reads via get_deployment_v2 instead of walking list_deployments.
project_status no longer probes /tasks for URLs; v2 supplies them.
All legacy logs+tasks fusion code is gone (~150 lines of client, ~250 lines of test). One production upstream, already upgraded.

Models

DeploymentStatus and ErrorCategory StrEnums.
DeploymentDetail, DeploymentListResponse, DeploymentComponentDetail, StatusError pydantic models, used to validate v2 responses.

Describe rendering

Status (color-coded by severity), Revision (short SHA), Last sync attempt.
"Last sync attempt" wording reflects the upstream contract: this is the most recent reconciliation attempt regardless of outcome.
When the deployment is not Healthy: an Errors table with Category/Resource/Message and a deduplicated explanation footer that summarizes each category once.
Status colors: Healthy=green; Degraded/Missing/OutOfSync/Suspended=red; Progressing/Pending=yellow; everything else=dim.

Tests

New respx tests for the v2 endpoints (happy path with status/errors, list, 404 propagation, pydantic validation surfacing as 502).
New CLI integration tests for deployment describe rendering (Healthy without errors table, Degraded with errors table and explanation footer), using typer's CliRunner with a stubbed ZadClient.
Parametrized test for _status_color covering every DeploymentStatus enum value.
Backwards-compat baseline updated: list_deployments_v2 and get_deployment_v2 registered as expected public methods.

Behavior change to flag

The status field returned by list_deployments and describe_deployment now reports the upstream DeploymentStatus enum (Healthy, Degraded, Progressing, OutOfSync, Suspended, Missing, Pending, Unavailable, Unknown). The previous values (Active / Failed / Deploying) reconstructed by the logs+tasks fusion are gone with that path. Any downstream script comparing status == "Active" should switch to status == "Healthy". Field name and type are preserved; the value set changes.

Test plan

uv run pytest (73 passed)
uv run ruff check . and uv run ruff format --check .
Backwards-compat baseline: no public methods removed; list_deployments_v2 and get_deployment_v2 are additive
Smoke test against production: zad deployment list and zad deployment describe <name> for a Healthy and a Degraded deployment; confirm status, revision, last sync attempt, and the Errors table render correctly

claude

Review summary

The spec additions are accurate and well-formed. One significant gap: the sync agent is supposed to implement new client methods and CLI commands (per CLAUDE.md), but nothing was implemented — both new v2 endpoints remain in the "not covered" list. These endpoints directly replace fragile workarounds that exist today.

No backwards-incompatible changes

The spec additions are purely additive. No existing CLI commands or client methods are touched.

Upstream RIG-Cluster shipped GET /v2/projects/{p}/deployments and GET /v2/projects/{p}/deployments/{d} (closes the structural problem flagged in RijksICTGilde/RIG-Cluster#51). The CLI no longer has to fuse /logs and /tasks to reconstruct deployment state. Client: - add ZadClient.list_deployments_v2 and get_deployment_v2 wrappers - list_deployments and describe_deployment now prefer the v2 endpoints and fall back to the legacy logs+tasks fusion on 404, so older Operations Manager deployments keep working - legacy code paths preserved as _list_deployments_legacy and _describe_deployment_legacy Models: - add DeploymentStatus and ErrorCategory string enums - add DeploymentDetail, DeploymentListResponse, DeploymentComponentDetail, StatusError pydantic models for the new response shapes Describe rendering: - show Status (color-coded), Revision (short SHA), Last sync - when Status indicates a problem, show an Errors table with Category/Resource/Message and a deduplicated explanation footer - hide the K8s Deployment column in the v2 path (the field doesn't exist there); legacy path still populates it Tests: - new respx tests for the v2 happy paths and the 404 fallback - existing legacy-path tests now mock a 404 on the v2 endpoint to exercise the same fallback code

claude

Test

claude

No backwards-incompatible changes detected. CLI command tree and all ZadClient public method signatures are preserved. One significant issue with the 404 fallback path; two minor items.

🟡 Minor — new public methods missing from backwards-compat baseline

list_deployments_v2 and get_deployment_v2 are new public methods on ZadClient but are absent from EXPECTED_CLIENT_METHODS in tests/test_backwards_compat.py. Per the backwards-compat policy, public methods should be added to the baseline so accidental future removal is caught by CI. Suggest appending both to the list (and adding entries in EXPECTED_METHOD_MIN_ARGS with "get_deployment_v2": 2 and "list_deployments_v2": 1). This file isn't in the diff so an inline comment isn't possible.

- describe_deployment now propagates a real 404 when the v2 list endpoint works but get-one 404s (deployment is genuinely missing). Previously the legacy fallback would silently return empty data. Probe by calling list_deployments_v2 to disambiguate "deployment missing" from "endpoint not registered on this upstream". - _status_color: Suspended joins the red tier alongside Degraded, Missing, OutOfSync per upstream's documented severity ordering. - backwards-compat baseline: register list_deployments_v2 and get_deployment_v2 as expected public methods.

Superseded by new review

claude

One significant issue with the list_deployments 404 fallback. Two minor items.

🟠 Significant — list_deployments 404 fallback is undiscriminating (see inline comment)

Unlike describe_deployment, list_deployments treats any 404 from the v2 endpoint as "old upstream" and silently falls back to legacy — including a 404 that means "project not found" on a new upstream. A caller who passes a wrong project name gets an empty list instead of a clear error. describe_deployment added proper disambiguation logic for exactly this reason; list_deployments should be consistent.

🟡 Minor — "Last sync:" label is inaccurate for degraded deployments (see inline comment)

The upstream API doc for last_synced_at states: "for a Degraded deployment this can be the time of a failed sync, not a healthy one." The label Last sync: implies a successful sync, which misleads operators diagnosing a broken deployment.

🟡 Minor — _status_color defined after first caller (see inline comment)

_status_color is called at the top of describe but defined after describe closes. Works fine in Python, but convention is helpers before callers.

- list_deployments now disambiguates "project not found" from "v2 endpoint not registered on this upstream", mirroring the describe_deployment logic. Probes via list_projects (available on every upstream version) before falling back to the legacy fusion. - "Last sync" label renamed to "Last sync attempt": per upstream spec, the timestamp is the most recent reconciliation attempt regardless of outcome, so a Degraded deployment with a recent failed sync no longer reads as if it last synced cleanly. - Move _status_color above describe (helpers before callers).

Superseded by new review

claude

Two previous issues fixed (disambiguation logic for list_deployments, label copy, helper ordering). One minor issue remains.

🟡 Minor — bare except ZadApiError in list_deployments swallows non-404 errors from list_projects (see inline comment)

A 403 or 500 from list_projects is caught and silently rerouted to the legacy path. An expired API key or transient server error produces a confusing error from the /logs endpoint instead of the real cause. Only a 404 means "old upstream"; all other errors should re-raise. No test covers this path.

The disambiguation probe in list_deployments was catching every ZadApiError from list_projects, including 401/403/5xx. An expired API key or transient server error would silently route to the legacy fusion path and surface a confusing error from /logs instead of the real cause. Now only 404 means "old upstream"; everything else propagates.

claude

One real bug in describe_deployment's fallback logic, and a minor note on mixed status values. Everything else looks solid — the fallback architecture, the model additions, the compat baseline updates, and the test suite are all well-done.

describe_deployment now mirrors list_deployments: when both v2 endpoints 404, it probes list_projects to tell "old upstream" from "project does not exist on this upstream". The old code silently fell back to the legacy logs+tasks fusion in both cases, masking "Project not found" with empty-component results. Both callers go through a shared _project_exists helper which only swallows a 404 from list_projects (treating that as "even /projects is missing, fall back"); 401/403/5xx propagate so the user sees the real cause. New test: describe_deployment raises ZadApiError(404) with a clear "Project '<name>' not found" message when the project is missing.

Superseded by new review

claude

One significant bug and two minor notes. The fallback architecture, 404 disambiguation, test coverage, and model additions are all well-done.

Three findings: 1. (Significant) project_status was unconditionally overwriting v2-supplied URLs with empty dicts on modern upstreams that have no recent completed tasks. The task probe is now best-effort: only overrides v2 URLs when it actually finds something. 2. Validate v2 read endpoint responses through pydantic. list_deployments_v2 and get_deployment_v2 now run the response through DeploymentListResponse / DeploymentDetail and re-emit a dict. Upstream schema drift surfaces as a clear pydantic error instead of leaking malformed data downstream. 3. Tests: - new test for project_status preserving v2 URLs when /tasks is empty - new test for the "/projects also 404s" branch (very old upstream)

claude

One real bug (URL priority inversion in project_status) and one minor gap (no test for the inversion scenario). Everything else looks solid.

The previous fix inverted the priority — task_urls or dep["urls"] gave task URLs priority when both were non-empty. v2 is authoritative when present, so prefer dep["urls"] (the v2-supplied dict) and fall back to the task probe only when v2 returned nothing (legacy upstreams or deployments with no public components). New test: when both v2 and task history carry URLs for the same deployment, the v2 URLs win.

Superseded by new review

claude

Review

🟠 Significant — `pydantic.ValidationError` is not handled by `@handle_api_errors`

handle_api_errors only catches ZadApiError, TaskFailedError, TaskTimeoutError. When list_deployments_v2 or get_deployment_v2 calls model_validate() on a response that doesn't match the schema — e.g., a new DeploymentStatus value added upstream, or a missing required field — pydantic raises ValidationError, which propagates past every except ZadApiError block in the call chain and reaches the command as an unhandled exception. Users see a raw Python traceback instead of a clean error message.

This is a realistic failure mode: DeploymentStatus and ErrorCategory are closed enums and the upstream can add values without a breaking-change label. Pydantic v2 in strict mode will reject unknown enum variants.

Concrete fix options (pick one):

Catch pydantic.ValidationError in handle_api_errors alongside ZadApiError and render it as formatter.render_error(f"Unexpected API response shape: {e}").
Wrap model_validate in the two client methods and re-raise as ZadApiError(502, ...).
Add model_config = ConfigDict(use_enum_values=True) to DeploymentDetail / StatusError with str field types so unknown enum variants don't crash validation.

🟡 Minor — Missing test: `describe_deployment` disambiguation probe returning non-404

test_list_deployments_propagates_non_404_from_probe covers the list_deployments path. There is no equivalent test for describe_deployment when list_deployments_v2 returns 403 (or 500) during the disambiguation probe. The raise re-raises list_err correctly, but the path is untested.

🟡 Minor — `_status_color` has no test

The function drives all status color rendering and it's easy for a typo to silently produce "dim" styling. A pytest.mark.parametrize covering each branch would be small and prevent regressions if the enum values are renamed.

🟡 Minor — `project_status` URL merge silently falls back to stale task URLs when v2 returns `{}`

dep.get("urls", {}) or task_urls treats an explicitly-empty v2 URL dict as absent. If a deployment's publish-on-web config is removed (so v2 legitimately returns {}), stale task-history URLs still surface under project status. Given that list_deployments_v2 is now authoritative, an explicit presence check would make the intent clearer:

# If v2 supplied urls (even if empty), prefer them; only fall back for legacy rows.
dep["urls"] = dep["urls"] if "urls" in dep else urls_by_deployment.get(dep["deployment"], {})

(Legacy rows from _list_deployments_legacy don't include the "urls" key, so "urls" in dep correctly separates the two cases.)

Four findings: 1. (Significant) ValidationError from pydantic schema mismatch was not caught by @handle_api_errors and would surface as a raw traceback. New _parse_v2_response helper translates ValidationError to ZadApiError(502, "Unexpected API response shape: ...") so the existing error rendering path handles upstream schema drift cleanly. This becomes load-bearing as upstream adds new DeploymentStatus or ErrorCategory enum values without a breaking-change label. 2. (Minor) project_status URL merge now uses presence rather than truthiness. v2 returning {} legitimately means "no publish-on-web"; stale task URLs no longer leak through after the config is removed. Legacy rows lack the "urls" key so the test still picks them up. 3. (Minor) New test: describe_deployment propagates non-404 (401/etc) from the disambiguation probe, mirroring the list_deployments test. 4. (Minor) New parametrized test for _status_color covering every DeploymentStatus enum value plus the empty-string fallback.

claude

No backwards-incompatible changes found. describe_deployment and list_deployments keep their public signatures; the v2 methods are purely additive. The fallback chain logic and 404 disambiguation are sound.

Three issues below.

- New CLI integration tests for deployment describe rendering, using typer's CliRunner with a stubbed ZadClient. Covers Healthy (no Errors table) and Degraded (Errors table with category, message, and explanation footer) so future Rich-markup or key-name regressions get caught at CI. - Trim multi-paragraph docstrings on _parse_v2_response, list_deployments_v2, get_deployment_v2, list_deployments, describe_deployment, _project_exists per CLAUDE.md (one-line docstrings only). Reasoning lives in the PR description. - _status_color now uses a dict keyed on DeploymentStatus enum members instead of bare string literals; if upstream renames a status the type system surfaces the dependency.

claude

One significant gap in test coverage and two minor observations. No backwards-incompatible changes, no security issues, no design-principle violations on CLI structure or client method conventions.

The fallback chain logic itself (v2 → list probe → projects probe → legacy) is correct and the tests for the list_deployments path are thorough.

There is one production upstream and it has been upgraded to ship the v2 read endpoints, so the fallback chain no longer earns its keep. Removed: - _list_deployments_legacy and _describe_deployment_legacy (logs+tasks fusion to reconstruct deployment state) - _project_exists disambiguation probe and the layered 404 handling in list_deployments / describe_deployment - project_status's /tasks call to recover URLs (v2 supplies them directly on every deployment) - The optional `subdomain` plumbing and `k8s_deployment` column rendering, both vestigial after the legacy path is gone - All tests covering legacy fallback paths, probe behavior, and stale task-history URL handling resolve_namespace now uses get_deployment_v2 directly instead of walking list_deployments. describe_deployment, list_deployments, and project_status pass through v2 fields without `.get()` defaults: after pydantic validation the required fields are guaranteed present and the optional ones are explicitly None. Net change: ~150 lines of client code and ~250 lines of tests removed, no behavior change against the current upstream.

Superseded by new review

claude

One significant issue and one minor nit. No backwards-incompatible CLI changes; the new v2 endpoints and pydantic validation layer are clean. The legacy-path removal is well-scoped and the new tests cover the happy path and the 502 surface.\n\nThe significant issue is about resilience to additive upstream enum changes, not a logic bug.

Closed enums were brittle: an additive upstream change (a new ErrorCategory or DeploymentStatus value) would make pydantic reject the entire DeploymentListResponse, taking the whole `zad deployment list` for a project down until the CLI is updated. Upstream's own `Unknown` catch-all signals they expect callers to handle unknown values gracefully, not hard-fail. - StatusError now coerces unknown category strings to UNKNOWN via a before-validator. - DeploymentDetail does the same for status. - Tests cover both coercion paths plus the known-value passthrough. - Drop vestigial `k8s_deployment` keys from the describe rendering test fixtures (the column was removed when v2 became the only source).

Superseded by new review

claude

One significant backwards-compat issue and two minor nits. The v2 migration and pydantic validation layer are clean; enum coercion, error rendering, and the new test coverage are all solid.

The significant issue is a field removal from a public method's return shape that isn't called out in the PR's behavior-change section.

- Restore `k8s_deployment` as a tombstone field in describe_deployment's component dicts. The v2 endpoint doesn't expose this field, but removing it from the public describe shape would break consumers reading `comp["k8s_deployment"]` from the JSON output. Per the backwards-compat policy, return-shape removals require a deprecation cycle; an empty-string tombstone is the additive-only path. - Replace the private `_value2member_map_` access in the enum coercion validators with a public set comprehension `{e.value for e in Enum}`. - Use a realistic 40-char SHA in the describe rendering test so the `[:12]` truncation is actually exercised.

Superseded by new review

claude

After multiple rounds the PR is in good shape. Two minor nits remain.

- _STATUS_COLORS is now annotated dict[DeploymentStatus, str], so type checkers flag a new enum value missing from the mapping instead of it silently falling through to "dim" at runtime. - The Degraded fixture's sync_revision was exactly 12 characters, so the [:12] truncation was a no-op there. Pad to 40 chars to match the Healthy fixture and actually verify truncation.

claude

Review

The v2 migration is well-executed overall: clean pydantic validation, correct coerce-to-UNKNOWN pattern for forward-compatibility, solid test coverage (404 propagation, pydantic 502 surfacing, every status value exercised). Two issues need addressing.

🟠 Significant — `K8s Deployment` column removed without being flagged

The component table in deployment describe previously showed a K8s Deployment column. That column is silently gone. The PR's "Behavior change to flag" section only mentions the status value-set change — this is a second undocumented output break. Scripts or runbooks that parse zad deployment describe output to get the k8s deployment name will silently lose that data. Add it to the behavior-change notice (at minimum), or keep the column populated with "" to preserve tabular column positions.

🟡 Minor — `_STATUS_COLORS` type annotation mismatch

_STATUS_COLORS: dict[DeploymentStatus, str] is looked up with a bare str in _status_color(status: str). Works at runtime via StrEnum, but a strict type checker flags dict[DeploymentStatus, str].get(str_value, ...). Fix: annotate the dict as dict[str, str].

claude · 2026-05-07T11:33:25Z

-        table.add_column("Image")
-    table.add_column("K8s Deployment")
-
+    table.add_column("Image")


🟠 Significant — The K8s Deployment column is silently removed here. The PR's "Behavior change to flag" section only lists the status value-set change; this is a second output break. Scripts parsing zad deployment describe for the k8s deployment name will break silently. Add this to the behavior-change notice, or preserve the column (populated with "") to keep tabular column positions stable.

claude · 2026-05-07T11:33:25Z

    )


+_STATUS_COLORS: dict[DeploymentStatus, str] = {


🟡 Minor — _STATUS_COLORS is annotated dict[DeploymentStatus, str] but _status_color receives a bare str. StrEnum makes the runtime lookup work, but strict type checkers will flag this. Change to dict[str, str] to match the actual call site.

claude · 2026-05-07T11:33:25Z

+                # k8s_deployment is a tombstone for backwards compatibility:
+                # the v2 endpoint doesn't expose it, but consumers of the
+                # legacy describe shape may still read the key.
+                {"name": c["reference"], "image": c["image"], "k8s_deployment": ""}


🟡 Minor — This key is tombstoned to "" for backwards compat, but the describe command no longer renders it (the K8s Deployment table column was dropped). Library consumers reading this key will silently get nothing useful. Either re-add the column to the rendered table, or drop the key from the return dict and document the removal — the current state gives false confidence.

claude-code-action@v1 renamed the autonomous prompt input from `direct_prompt` to `prompt`. The old name now triggers a warning ("Unexpected input(s) 'direct_prompt'") and is silently ignored, so the action waits for a trigger phrase or label that the workflow never provides. The Implement step exits successfully having done nothing. This is exactly what happened on PR #23, which landed with only the spec bump and required manual implementation. With this fix, future api-sync runs should auto-implement new endpoints as designed.

chore: update upstream OpenAPI spec 2026-05-07

feffa7d

anneschuth added api-sync Automated API sync changes automated Created by automated workflow labels May 7, 2026