Skip to content

feat: use v2 deployment read endpoints#23

Merged
anneschuth merged 14 commits into
mainfrom
api-sync/2026-05-07-092750
May 7, 2026
Merged

feat: use v2 deployment read endpoints#23
anneschuth merged 14 commits into
mainfrom
api-sync/2026-05-07-092750

Conversation

@anneschuth

@anneschuth anneschuth commented May 7, 2026

Copy link
Copy Markdown
Member

Summary

Upstream RIG-Cluster shipped GET /v2/projects/{p}/deployments and GET /v2/projects/{p}/deployments/{d} (closes the structural problem flagged in RijksICTGilde/RIG-Cluster#51). The CLI no longer reconstructs deployment state from /logs + /tasks; it just reads it.

This PR also picks up the api/upstream-openapi.json bump from the daily sync workflow.

Client

  • New: ZadClient.list_deployments_v2(project) and get_deployment_v2(project, deployment). Responses are validated through pydantic; schema drift surfaces as ZadApiError(502) so the existing error-render path stays clean.
  • list_deployments and describe_deployment now wrap the v2 endpoints directly. Same public signatures, richer return shape (status, sync_revision, last_synced_at, errors).
  • resolve_namespace reads via get_deployment_v2 instead of walking list_deployments.
  • project_status no longer probes /tasks for URLs; v2 supplies them.
  • All legacy logs+tasks fusion code is gone (~150 lines of client, ~250 lines of test). One production upstream, already upgraded.

Models

  • DeploymentStatus and ErrorCategory StrEnums.
  • DeploymentDetail, DeploymentListResponse, DeploymentComponentDetail, StatusError pydantic models, used to validate v2 responses.

Describe rendering

  • Status (color-coded by severity), Revision (short SHA), Last sync attempt.
  • "Last sync attempt" wording reflects the upstream contract: this is the most recent reconciliation attempt regardless of outcome.
  • When the deployment is not Healthy: an Errors table with Category/Resource/Message and a deduplicated explanation footer that summarizes each category once.
  • Status colors: Healthy=green; Degraded/Missing/OutOfSync/Suspended=red; Progressing/Pending=yellow; everything else=dim.

Tests

  • New respx tests for the v2 endpoints (happy path with status/errors, list, 404 propagation, pydantic validation surfacing as 502).
  • New CLI integration tests for deployment describe rendering (Healthy without errors table, Degraded with errors table and explanation footer), using typer's CliRunner with a stubbed ZadClient.
  • Parametrized test for _status_color covering every DeploymentStatus enum value.
  • Backwards-compat baseline updated: list_deployments_v2 and get_deployment_v2 registered as expected public methods.

Behavior change to flag

The status field returned by list_deployments and describe_deployment now reports the upstream DeploymentStatus enum (Healthy, Degraded, Progressing, OutOfSync, Suspended, Missing, Pending, Unavailable, Unknown). The previous values (Active / Failed / Deploying) reconstructed by the logs+tasks fusion are gone with that path. Any downstream script comparing status == "Active" should switch to status == "Healthy". Field name and type are preserved; the value set changes.

Test plan

  • uv run pytest (73 passed)
  • uv run ruff check . and uv run ruff format --check .
  • Backwards-compat baseline: no public methods removed; list_deployments_v2 and get_deployment_v2 are additive
  • Smoke test against production: zad deployment list and zad deployment describe <name> for a Healthy and a Degraded deployment; confirm status, revision, last sync attempt, and the Errors table render correctly

@anneschuth anneschuth added api-sync Automated API sync changes automated Created by automated workflow labels May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

The spec additions are accurate and well-formed. One significant gap: the sync agent is supposed to implement new client methods and CLI commands (per CLAUDE.md), but nothing was implemented — both new v2 endpoints remain in the "not covered" list. These endpoints directly replace fragile workarounds that exist today.

No backwards-incompatible changes

The spec additions are purely additive. No existing CLI commands or client methods are touched.

Upstream RIG-Cluster shipped GET /v2/projects/{p}/deployments and
GET /v2/projects/{p}/deployments/{d} (closes the structural problem
flagged in RijksICTGilde/RIG-Cluster#51). The CLI no longer has to
fuse /logs and /tasks to reconstruct deployment state.

Client:
- add ZadClient.list_deployments_v2 and get_deployment_v2 wrappers
- list_deployments and describe_deployment now prefer the v2 endpoints
  and fall back to the legacy logs+tasks fusion on 404, so older
  Operations Manager deployments keep working
- legacy code paths preserved as _list_deployments_legacy and
  _describe_deployment_legacy

Models:
- add DeploymentStatus and ErrorCategory string enums
- add DeploymentDetail, DeploymentListResponse, DeploymentComponentDetail,
  StatusError pydantic models for the new response shapes

Describe rendering:
- show Status (color-coded), Revision (short SHA), Last sync
- when Status indicates a problem, show an Errors table with
  Category/Resource/Message and a deduplicated explanation footer
- hide the K8s Deployment column in the v2 path (the field doesn't
  exist there); legacy path still populates it

Tests:
- new respx tests for the v2 happy paths and the 404 fallback
- existing legacy-path tests now mock a 404 on the v2 endpoint to
  exercise the same fallback code
@anneschuth anneschuth changed the title chore: sync with upstream API changes 2026-05-07 feat: use v2 deployment read endpoints May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test

claude[bot]
claude Bot previously requested changes May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No backwards-incompatible changes detected. CLI command tree and all ZadClient public method signatures are preserved. One significant issue with the 404 fallback path; two minor items.

🟡 Minor — new public methods missing from backwards-compat baseline

list_deployments_v2 and get_deployment_v2 are new public methods on ZadClient but are absent from EXPECTED_CLIENT_METHODS in tests/test_backwards_compat.py. Per the backwards-compat policy, public methods should be added to the baseline so accidental future removal is caught by CI. Suggest appending both to the list (and adding entries in EXPECTED_METHOD_MIN_ARGS with "get_deployment_v2": 2 and "list_deployments_v2": 1). This file isn't in the diff so an inline comment isn't possible.

- describe_deployment now propagates a real 404 when the v2 list
  endpoint works but get-one 404s (deployment is genuinely missing).
  Previously the legacy fallback would silently return empty data.
  Probe by calling list_deployments_v2 to disambiguate "deployment
  missing" from "endpoint not registered on this upstream".
- _status_color: Suspended joins the red tier alongside Degraded,
  Missing, OutOfSync per upstream's documented severity ordering.
- backwards-compat baseline: register list_deployments_v2 and
  get_deployment_v2 as expected public methods.
@claude claude Bot dismissed their stale review May 7, 2026 10:09

Superseded by new review

claude[bot]
claude Bot previously requested changes May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One significant issue with the list_deployments 404 fallback. Two minor items.


🟠 Significant — list_deployments 404 fallback is undiscriminating (see inline comment)

Unlike describe_deployment, list_deployments treats any 404 from the v2 endpoint as "old upstream" and silently falls back to legacy — including a 404 that means "project not found" on a new upstream. A caller who passes a wrong project name gets an empty list instead of a clear error. describe_deployment added proper disambiguation logic for exactly this reason; list_deployments should be consistent.


🟡 Minor — "Last sync:" label is inaccurate for degraded deployments (see inline comment)

The upstream API doc for last_synced_at states: "for a Degraded deployment this can be the time of a failed sync, not a healthy one." The label Last sync: implies a successful sync, which misleads operators diagnosing a broken deployment.


🟡 Minor — _status_color defined after first caller (see inline comment)

_status_color is called at the top of describe but defined after describe closes. Works fine in Python, but convention is helpers before callers.

- list_deployments now disambiguates "project not found" from
  "v2 endpoint not registered on this upstream", mirroring the
  describe_deployment logic. Probes via list_projects (available on
  every upstream version) before falling back to the legacy fusion.
- "Last sync" label renamed to "Last sync attempt": per upstream
  spec, the timestamp is the most recent reconciliation attempt
  regardless of outcome, so a Degraded deployment with a recent
  failed sync no longer reads as if it last synced cleanly.
- Move _status_color above describe (helpers before callers).
@claude claude Bot dismissed their stale review May 7, 2026 10:17

Superseded by new review

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two previous issues fixed (disambiguation logic for list_deployments, label copy, helper ordering). One minor issue remains.


🟡 Minor — bare except ZadApiError in list_deployments swallows non-404 errors from list_projects (see inline comment)

A 403 or 500 from list_projects is caught and silently rerouted to the legacy path. An expired API key or transient server error produces a confusing error from the /logs endpoint instead of the real cause. Only a 404 means "old upstream"; all other errors should re-raise. No test covers this path.

The disambiguation probe in list_deployments was catching every
ZadApiError from list_projects, including 401/403/5xx. An expired
API key or transient server error would silently route to the
legacy fusion path and surface a confusing error from /logs
instead of the real cause. Now only 404 means "old upstream";
everything else propagates.
claude[bot]
claude Bot previously requested changes May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One real bug in describe_deployment's fallback logic, and a minor note on mixed status values. Everything else looks solid — the fallback architecture, the model additions, the compat baseline updates, and the test suite are all well-done.

describe_deployment now mirrors list_deployments: when both v2
endpoints 404, it probes list_projects to tell "old upstream" from
"project does not exist on this upstream". The old code silently
fell back to the legacy logs+tasks fusion in both cases, masking
"Project not found" with empty-component results.

Both callers go through a shared _project_exists helper which only
swallows a 404 from list_projects (treating that as "even /projects
is missing, fall back"); 401/403/5xx propagate so the user sees the
real cause.

New test: describe_deployment raises ZadApiError(404) with a clear
"Project '<name>' not found" message when the project is missing.
@claude claude Bot dismissed their stale review May 7, 2026 10:30

Superseded by new review

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One significant bug and two minor notes. The fallback architecture, 404 disambiguation, test coverage, and model additions are all well-done.

Three findings:

1. (Significant) project_status was unconditionally overwriting v2-supplied
   URLs with empty dicts on modern upstreams that have no recent completed
   tasks. The task probe is now best-effort: only overrides v2 URLs when
   it actually finds something.

2. Validate v2 read endpoint responses through pydantic.
   list_deployments_v2 and get_deployment_v2 now run the response through
   DeploymentListResponse / DeploymentDetail and re-emit a dict. Upstream
   schema drift surfaces as a clear pydantic error instead of leaking
   malformed data downstream.

3. Tests:
   - new test for project_status preserving v2 URLs when /tasks is empty
   - new test for the "/projects also 404s" branch (very old upstream)
claude[bot]
claude Bot previously requested changes May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One real bug (URL priority inversion in project_status) and one minor gap (no test for the inversion scenario). Everything else looks solid.

The previous fix inverted the priority — task_urls or dep["urls"] gave
task URLs priority when both were non-empty. v2 is authoritative when
present, so prefer dep["urls"] (the v2-supplied dict) and fall back to
the task probe only when v2 returned nothing (legacy upstreams or
deployments with no public components).

New test: when both v2 and task history carry URLs for the same
deployment, the v2 URLs win.
@claude claude Bot dismissed their stale review May 7, 2026 10:42

Superseded by new review

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

🟠 Significant — pydantic.ValidationError is not handled by @handle_api_errors

handle_api_errors only catches ZadApiError, TaskFailedError, TaskTimeoutError. When list_deployments_v2 or get_deployment_v2 calls model_validate() on a response that doesn't match the schema — e.g., a new DeploymentStatus value added upstream, or a missing required field — pydantic raises ValidationError, which propagates past every except ZadApiError block in the call chain and reaches the command as an unhandled exception. Users see a raw Python traceback instead of a clean error message.

This is a realistic failure mode: DeploymentStatus and ErrorCategory are closed enums and the upstream can add values without a breaking-change label. Pydantic v2 in strict mode will reject unknown enum variants.

Concrete fix options (pick one):

  1. Catch pydantic.ValidationError in handle_api_errors alongside ZadApiError and render it as formatter.render_error(f"Unexpected API response shape: {e}").
  2. Wrap model_validate in the two client methods and re-raise as ZadApiError(502, ...).
  3. Add model_config = ConfigDict(use_enum_values=True) to DeploymentDetail / StatusError with str field types so unknown enum variants don't crash validation.

🟡 Minor — Missing test: describe_deployment disambiguation probe returning non-404

test_list_deployments_propagates_non_404_from_probe covers the list_deployments path. There is no equivalent test for describe_deployment when list_deployments_v2 returns 403 (or 500) during the disambiguation probe. The raise re-raises list_err correctly, but the path is untested.


🟡 Minor — _status_color has no test

The function drives all status color rendering and it's easy for a typo to silently produce "dim" styling. A pytest.mark.parametrize covering each branch would be small and prevent regressions if the enum values are renamed.


🟡 Minor — project_status URL merge silently falls back to stale task URLs when v2 returns {}

dep.get("urls", {}) or task_urls treats an explicitly-empty v2 URL dict as absent. If a deployment's publish-on-web config is removed (so v2 legitimately returns {}), stale task-history URLs still surface under project status. Given that list_deployments_v2 is now authoritative, an explicit presence check would make the intent clearer:

# If v2 supplied urls (even if empty), prefer them; only fall back for legacy rows.
dep["urls"] = dep["urls"] if "urls" in dep else urls_by_deployment.get(dep["deployment"], {})

(Legacy rows from _list_deployments_legacy don't include the "urls" key, so "urls" in dep correctly separates the two cases.)

Four findings:

1. (Significant) ValidationError from pydantic schema mismatch was not
   caught by @handle_api_errors and would surface as a raw traceback.
   New _parse_v2_response helper translates ValidationError to
   ZadApiError(502, "Unexpected API response shape: ...") so the
   existing error rendering path handles upstream schema drift cleanly.
   This becomes load-bearing as upstream adds new DeploymentStatus or
   ErrorCategory enum values without a breaking-change label.

2. (Minor) project_status URL merge now uses presence rather than
   truthiness. v2 returning {} legitimately means "no publish-on-web";
   stale task URLs no longer leak through after the config is removed.
   Legacy rows lack the "urls" key so the test still picks them up.

3. (Minor) New test: describe_deployment propagates non-404 (401/etc)
   from the disambiguation probe, mirroring the list_deployments test.

4. (Minor) New parametrized test for _status_color covering every
   DeploymentStatus enum value plus the empty-string fallback.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No backwards-incompatible changes found. describe_deployment and list_deployments keep their public signatures; the v2 methods are purely additive. The fallback chain logic and 404 disambiguation are sound.

Three issues below.

- New CLI integration tests for deployment describe rendering, using
  typer's CliRunner with a stubbed ZadClient. Covers Healthy (no
  Errors table) and Degraded (Errors table with category, message,
  and explanation footer) so future Rich-markup or key-name
  regressions get caught at CI.
- Trim multi-paragraph docstrings on _parse_v2_response,
  list_deployments_v2, get_deployment_v2, list_deployments,
  describe_deployment, _project_exists per CLAUDE.md (one-line
  docstrings only). Reasoning lives in the PR description.
- _status_color now uses a dict keyed on DeploymentStatus enum
  members instead of bare string literals; if upstream renames a
  status the type system surfaces the dependency.
claude[bot]
claude Bot previously requested changes May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One significant gap in test coverage and two minor observations. No backwards-incompatible changes, no security issues, no design-principle violations on CLI structure or client method conventions.

The fallback chain logic itself (v2 → list probe → projects probe → legacy) is correct and the tests for the list_deployments path are thorough.

There is one production upstream and it has been upgraded to ship the
v2 read endpoints, so the fallback chain no longer earns its keep.

Removed:
- _list_deployments_legacy and _describe_deployment_legacy (logs+tasks
  fusion to reconstruct deployment state)
- _project_exists disambiguation probe and the layered 404 handling in
  list_deployments / describe_deployment
- project_status's /tasks call to recover URLs (v2 supplies them
  directly on every deployment)
- The optional `subdomain` plumbing and `k8s_deployment` column
  rendering, both vestigial after the legacy path is gone
- All tests covering legacy fallback paths, probe behavior, and stale
  task-history URL handling

resolve_namespace now uses get_deployment_v2 directly instead of
walking list_deployments. describe_deployment, list_deployments, and
project_status pass through v2 fields without `.get()` defaults: after
pydantic validation the required fields are guaranteed present and
the optional ones are explicitly None.

Net change: ~150 lines of client code and ~250 lines of tests removed,
no behavior change against the current upstream.
@claude claude Bot dismissed their stale review May 7, 2026 11:08

Superseded by new review

claude[bot]
claude Bot previously requested changes May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One significant issue and one minor nit. No backwards-incompatible CLI changes; the new v2 endpoints and pydantic validation layer are clean. The legacy-path removal is well-scoped and the new tests cover the happy path and the 502 surface.\n\nThe significant issue is about resilience to additive upstream enum changes, not a logic bug.

Closed enums were brittle: an additive upstream change (a new
ErrorCategory or DeploymentStatus value) would make pydantic reject
the entire DeploymentListResponse, taking the whole `zad deployment
list` for a project down until the CLI is updated. Upstream's own
`Unknown` catch-all signals they expect callers to handle unknown
values gracefully, not hard-fail.

- StatusError now coerces unknown category strings to UNKNOWN via a
  before-validator.
- DeploymentDetail does the same for status.
- Tests cover both coercion paths plus the known-value passthrough.
- Drop vestigial `k8s_deployment` keys from the describe rendering
  test fixtures (the column was removed when v2 became the only
  source).
@claude claude Bot dismissed their stale review May 7, 2026 11:17

Superseded by new review

claude[bot]
claude Bot previously requested changes May 7, 2026

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One significant backwards-compat issue and two minor nits. The v2 migration and pydantic validation layer are clean; enum coercion, error rendering, and the new test coverage are all solid.

The significant issue is a field removal from a public method's return shape that isn't called out in the PR's behavior-change section.

- Restore `k8s_deployment` as a tombstone field in describe_deployment's
  component dicts. The v2 endpoint doesn't expose this field, but
  removing it from the public describe shape would break consumers
  reading `comp["k8s_deployment"]` from the JSON output. Per the
  backwards-compat policy, return-shape removals require a deprecation
  cycle; an empty-string tombstone is the additive-only path.
- Replace the private `_value2member_map_` access in the enum coercion
  validators with a public set comprehension `{e.value for e in Enum}`.
- Use a realistic 40-char SHA in the describe rendering test so the
  `[:12]` truncation is actually exercised.
@claude claude Bot dismissed their stale review May 7, 2026 11:22

Superseded by new review

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After multiple rounds the PR is in good shape. Two minor nits remain.

- _STATUS_COLORS is now annotated dict[DeploymentStatus, str], so type
  checkers flag a new enum value missing from the mapping instead of
  it silently falling through to "dim" at runtime.
- The Degraded fixture's sync_revision was exactly 12 characters, so
  the [:12] truncation was a no-op there. Pad to 40 chars to match
  the Healthy fixture and actually verify truncation.

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The v2 migration is well-executed overall: clean pydantic validation, correct coerce-to-UNKNOWN pattern for forward-compatibility, solid test coverage (404 propagation, pydantic 502 surfacing, every status value exercised). Two issues need addressing.


🟠 Significant — K8s Deployment column removed without being flagged

The component table in deployment describe previously showed a K8s Deployment column. That column is silently gone. The PR's "Behavior change to flag" section only mentions the status value-set change — this is a second undocumented output break. Scripts or runbooks that parse zad deployment describe output to get the k8s deployment name will silently lose that data. Add it to the behavior-change notice (at minimum), or keep the column populated with "" to preserve tabular column positions.


🟡 Minor — _STATUS_COLORS type annotation mismatch

_STATUS_COLORS: dict[DeploymentStatus, str] is looked up with a bare str in _status_color(status: str). Works at runtime via StrEnum, but a strict type checker flags dict[DeploymentStatus, str].get(str_value, ...). Fix: annotate the dict as dict[str, str].

table.add_column("Image")
table.add_column("K8s Deployment")

table.add_column("Image")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Significant — The K8s Deployment column is silently removed here. The PR's "Behavior change to flag" section only lists the status value-set change; this is a second output break. Scripts parsing zad deployment describe for the k8s deployment name will break silently. Add this to the behavior-change notice, or preserve the column (populated with "") to keep tabular column positions stable.

)


_STATUS_COLORS: dict[DeploymentStatus, str] = {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor_STATUS_COLORS is annotated dict[DeploymentStatus, str] but _status_color receives a bare str. StrEnum makes the runtime lookup work, but strict type checkers will flag this. Change to dict[str, str] to match the actual call site.

Comment thread src/zad_cli/api/client.py
# k8s_deployment is a tombstone for backwards compatibility:
# the v2 endpoint doesn't expose it, but consumers of the
# legacy describe shape may still read the key.
{"name": c["reference"], "image": c["image"], "k8s_deployment": ""}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Minor — This key is tombstoned to "" for backwards compat, but the describe command no longer renders it (the K8s Deployment table column was dropped). Library consumers reading this key will silently get nothing useful. Either re-add the column to the rendered table, or drop the key from the return dict and document the removal — the current state gives false confidence.

@anneschuth anneschuth merged commit d20a576 into main May 7, 2026
12 checks passed
@anneschuth anneschuth deleted the api-sync/2026-05-07-092750 branch May 7, 2026 11:33
anneschuth added a commit that referenced this pull request May 7, 2026
claude-code-action@v1 renamed the autonomous prompt input from
`direct_prompt` to `prompt`. The old name now triggers a warning
("Unexpected input(s) 'direct_prompt'") and is silently ignored, so
the action waits for a trigger phrase or label that the workflow
never provides. The Implement step exits successfully having done
nothing.

This is exactly what happened on PR #23, which landed with only the
spec bump and required manual implementation. With this fix, future
api-sync runs should auto-implement new endpoints as designed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-sync Automated API sync changes automated Created by automated workflow

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant