feat: use v2 deployment read endpoints#23
Conversation
There was a problem hiding this comment.
Review summary
The spec additions are accurate and well-formed. One significant gap: the sync agent is supposed to implement new client methods and CLI commands (per CLAUDE.md), but nothing was implemented — both new v2 endpoints remain in the "not covered" list. These endpoints directly replace fragile workarounds that exist today.
No backwards-incompatible changes
The spec additions are purely additive. No existing CLI commands or client methods are touched.
Upstream RIG-Cluster shipped GET /v2/projects/{p}/deployments and
GET /v2/projects/{p}/deployments/{d} (closes the structural problem
flagged in RijksICTGilde/RIG-Cluster#51). The CLI no longer has to
fuse /logs and /tasks to reconstruct deployment state.
Client:
- add ZadClient.list_deployments_v2 and get_deployment_v2 wrappers
- list_deployments and describe_deployment now prefer the v2 endpoints
and fall back to the legacy logs+tasks fusion on 404, so older
Operations Manager deployments keep working
- legacy code paths preserved as _list_deployments_legacy and
_describe_deployment_legacy
Models:
- add DeploymentStatus and ErrorCategory string enums
- add DeploymentDetail, DeploymentListResponse, DeploymentComponentDetail,
StatusError pydantic models for the new response shapes
Describe rendering:
- show Status (color-coded), Revision (short SHA), Last sync
- when Status indicates a problem, show an Errors table with
Category/Resource/Message and a deduplicated explanation footer
- hide the K8s Deployment column in the v2 path (the field doesn't
exist there); legacy path still populates it
Tests:
- new respx tests for the v2 happy paths and the 404 fallback
- existing legacy-path tests now mock a 404 on the v2 endpoint to
exercise the same fallback code
There was a problem hiding this comment.
No backwards-incompatible changes detected. CLI command tree and all ZadClient public method signatures are preserved. One significant issue with the 404 fallback path; two minor items.
🟡 Minor — new public methods missing from backwards-compat baseline
list_deployments_v2 and get_deployment_v2 are new public methods on ZadClient but are absent from EXPECTED_CLIENT_METHODS in tests/test_backwards_compat.py. Per the backwards-compat policy, public methods should be added to the baseline so accidental future removal is caught by CI. Suggest appending both to the list (and adding entries in EXPECTED_METHOD_MIN_ARGS with "get_deployment_v2": 2 and "list_deployments_v2": 1). This file isn't in the diff so an inline comment isn't possible.
- describe_deployment now propagates a real 404 when the v2 list endpoint works but get-one 404s (deployment is genuinely missing). Previously the legacy fallback would silently return empty data. Probe by calling list_deployments_v2 to disambiguate "deployment missing" from "endpoint not registered on this upstream". - _status_color: Suspended joins the red tier alongside Degraded, Missing, OutOfSync per upstream's documented severity ordering. - backwards-compat baseline: register list_deployments_v2 and get_deployment_v2 as expected public methods.
There was a problem hiding this comment.
One significant issue with the list_deployments 404 fallback. Two minor items.
🟠 Significant — list_deployments 404 fallback is undiscriminating (see inline comment)
Unlike describe_deployment, list_deployments treats any 404 from the v2 endpoint as "old upstream" and silently falls back to legacy — including a 404 that means "project not found" on a new upstream. A caller who passes a wrong project name gets an empty list instead of a clear error. describe_deployment added proper disambiguation logic for exactly this reason; list_deployments should be consistent.
🟡 Minor — "Last sync:" label is inaccurate for degraded deployments (see inline comment)
The upstream API doc for last_synced_at states: "for a Degraded deployment this can be the time of a failed sync, not a healthy one." The label Last sync: implies a successful sync, which misleads operators diagnosing a broken deployment.
🟡 Minor — _status_color defined after first caller (see inline comment)
_status_color is called at the top of describe but defined after describe closes. Works fine in Python, but convention is helpers before callers.
- list_deployments now disambiguates "project not found" from "v2 endpoint not registered on this upstream", mirroring the describe_deployment logic. Probes via list_projects (available on every upstream version) before falling back to the legacy fusion. - "Last sync" label renamed to "Last sync attempt": per upstream spec, the timestamp is the most recent reconciliation attempt regardless of outcome, so a Degraded deployment with a recent failed sync no longer reads as if it last synced cleanly. - Move _status_color above describe (helpers before callers).
There was a problem hiding this comment.
Two previous issues fixed (disambiguation logic for list_deployments, label copy, helper ordering). One minor issue remains.
🟡 Minor — bare except ZadApiError in list_deployments swallows non-404 errors from list_projects (see inline comment)
A 403 or 500 from list_projects is caught and silently rerouted to the legacy path. An expired API key or transient server error produces a confusing error from the /logs endpoint instead of the real cause. Only a 404 means "old upstream"; all other errors should re-raise. No test covers this path.
The disambiguation probe in list_deployments was catching every ZadApiError from list_projects, including 401/403/5xx. An expired API key or transient server error would silently route to the legacy fusion path and surface a confusing error from /logs instead of the real cause. Now only 404 means "old upstream"; everything else propagates.
describe_deployment now mirrors list_deployments: when both v2 endpoints 404, it probes list_projects to tell "old upstream" from "project does not exist on this upstream". The old code silently fell back to the legacy logs+tasks fusion in both cases, masking "Project not found" with empty-component results. Both callers go through a shared _project_exists helper which only swallows a 404 from list_projects (treating that as "even /projects is missing, fall back"); 401/403/5xx propagate so the user sees the real cause. New test: describe_deployment raises ZadApiError(404) with a clear "Project '<name>' not found" message when the project is missing.
Three findings: 1. (Significant) project_status was unconditionally overwriting v2-supplied URLs with empty dicts on modern upstreams that have no recent completed tasks. The task probe is now best-effort: only overrides v2 URLs when it actually finds something. 2. Validate v2 read endpoint responses through pydantic. list_deployments_v2 and get_deployment_v2 now run the response through DeploymentListResponse / DeploymentDetail and re-emit a dict. Upstream schema drift surfaces as a clear pydantic error instead of leaking malformed data downstream. 3. Tests: - new test for project_status preserving v2 URLs when /tasks is empty - new test for the "/projects also 404s" branch (very old upstream)
The previous fix inverted the priority — task_urls or dep["urls"] gave task URLs priority when both were non-empty. v2 is authoritative when present, so prefer dep["urls"] (the v2-supplied dict) and fall back to the task probe only when v2 returned nothing (legacy upstreams or deployments with no public components). New test: when both v2 and task history carry URLs for the same deployment, the v2 URLs win.
There was a problem hiding this comment.
Review
🟠 Significant — pydantic.ValidationError is not handled by @handle_api_errors
handle_api_errors only catches ZadApiError, TaskFailedError, TaskTimeoutError. When list_deployments_v2 or get_deployment_v2 calls model_validate() on a response that doesn't match the schema — e.g., a new DeploymentStatus value added upstream, or a missing required field — pydantic raises ValidationError, which propagates past every except ZadApiError block in the call chain and reaches the command as an unhandled exception. Users see a raw Python traceback instead of a clean error message.
This is a realistic failure mode: DeploymentStatus and ErrorCategory are closed enums and the upstream can add values without a breaking-change label. Pydantic v2 in strict mode will reject unknown enum variants.
Concrete fix options (pick one):
- Catch
pydantic.ValidationErrorinhandle_api_errorsalongsideZadApiErrorand render it asformatter.render_error(f"Unexpected API response shape: {e}"). - Wrap
model_validatein the two client methods and re-raise asZadApiError(502, ...). - Add
model_config = ConfigDict(use_enum_values=True)toDeploymentDetail/StatusErrorwithstrfield types so unknown enum variants don't crash validation.
🟡 Minor — Missing test: describe_deployment disambiguation probe returning non-404
test_list_deployments_propagates_non_404_from_probe covers the list_deployments path. There is no equivalent test for describe_deployment when list_deployments_v2 returns 403 (or 500) during the disambiguation probe. The raise re-raises list_err correctly, but the path is untested.
🟡 Minor — _status_color has no test
The function drives all status color rendering and it's easy for a typo to silently produce "dim" styling. A pytest.mark.parametrize covering each branch would be small and prevent regressions if the enum values are renamed.
🟡 Minor — project_status URL merge silently falls back to stale task URLs when v2 returns {}
dep.get("urls", {}) or task_urls treats an explicitly-empty v2 URL dict as absent. If a deployment's publish-on-web config is removed (so v2 legitimately returns {}), stale task-history URLs still surface under project status. Given that list_deployments_v2 is now authoritative, an explicit presence check would make the intent clearer:
# If v2 supplied urls (even if empty), prefer them; only fall back for legacy rows.
dep["urls"] = dep["urls"] if "urls" in dep else urls_by_deployment.get(dep["deployment"], {})(Legacy rows from _list_deployments_legacy don't include the "urls" key, so "urls" in dep correctly separates the two cases.)
Four findings:
1. (Significant) ValidationError from pydantic schema mismatch was not
caught by @handle_api_errors and would surface as a raw traceback.
New _parse_v2_response helper translates ValidationError to
ZadApiError(502, "Unexpected API response shape: ...") so the
existing error rendering path handles upstream schema drift cleanly.
This becomes load-bearing as upstream adds new DeploymentStatus or
ErrorCategory enum values without a breaking-change label.
2. (Minor) project_status URL merge now uses presence rather than
truthiness. v2 returning {} legitimately means "no publish-on-web";
stale task URLs no longer leak through after the config is removed.
Legacy rows lack the "urls" key so the test still picks them up.
3. (Minor) New test: describe_deployment propagates non-404 (401/etc)
from the disambiguation probe, mirroring the list_deployments test.
4. (Minor) New parametrized test for _status_color covering every
DeploymentStatus enum value plus the empty-string fallback.
- New CLI integration tests for deployment describe rendering, using typer's CliRunner with a stubbed ZadClient. Covers Healthy (no Errors table) and Degraded (Errors table with category, message, and explanation footer) so future Rich-markup or key-name regressions get caught at CI. - Trim multi-paragraph docstrings on _parse_v2_response, list_deployments_v2, get_deployment_v2, list_deployments, describe_deployment, _project_exists per CLAUDE.md (one-line docstrings only). Reasoning lives in the PR description. - _status_color now uses a dict keyed on DeploymentStatus enum members instead of bare string literals; if upstream renames a status the type system surfaces the dependency.
There was a problem hiding this comment.
One significant gap in test coverage and two minor observations. No backwards-incompatible changes, no security issues, no design-principle violations on CLI structure or client method conventions.
The fallback chain logic itself (v2 → list probe → projects probe → legacy) is correct and the tests for the list_deployments path are thorough.
There is one production upstream and it has been upgraded to ship the v2 read endpoints, so the fallback chain no longer earns its keep. Removed: - _list_deployments_legacy and _describe_deployment_legacy (logs+tasks fusion to reconstruct deployment state) - _project_exists disambiguation probe and the layered 404 handling in list_deployments / describe_deployment - project_status's /tasks call to recover URLs (v2 supplies them directly on every deployment) - The optional `subdomain` plumbing and `k8s_deployment` column rendering, both vestigial after the legacy path is gone - All tests covering legacy fallback paths, probe behavior, and stale task-history URL handling resolve_namespace now uses get_deployment_v2 directly instead of walking list_deployments. describe_deployment, list_deployments, and project_status pass through v2 fields without `.get()` defaults: after pydantic validation the required fields are guaranteed present and the optional ones are explicitly None. Net change: ~150 lines of client code and ~250 lines of tests removed, no behavior change against the current upstream.
There was a problem hiding this comment.
One significant issue and one minor nit. No backwards-incompatible CLI changes; the new v2 endpoints and pydantic validation layer are clean. The legacy-path removal is well-scoped and the new tests cover the happy path and the 502 surface.\n\nThe significant issue is about resilience to additive upstream enum changes, not a logic bug.
Closed enums were brittle: an additive upstream change (a new ErrorCategory or DeploymentStatus value) would make pydantic reject the entire DeploymentListResponse, taking the whole `zad deployment list` for a project down until the CLI is updated. Upstream's own `Unknown` catch-all signals they expect callers to handle unknown values gracefully, not hard-fail. - StatusError now coerces unknown category strings to UNKNOWN via a before-validator. - DeploymentDetail does the same for status. - Tests cover both coercion paths plus the known-value passthrough. - Drop vestigial `k8s_deployment` keys from the describe rendering test fixtures (the column was removed when v2 became the only source).
There was a problem hiding this comment.
One significant backwards-compat issue and two minor nits. The v2 migration and pydantic validation layer are clean; enum coercion, error rendering, and the new test coverage are all solid.
The significant issue is a field removal from a public method's return shape that isn't called out in the PR's behavior-change section.
- Restore `k8s_deployment` as a tombstone field in describe_deployment's
component dicts. The v2 endpoint doesn't expose this field, but
removing it from the public describe shape would break consumers
reading `comp["k8s_deployment"]` from the JSON output. Per the
backwards-compat policy, return-shape removals require a deprecation
cycle; an empty-string tombstone is the additive-only path.
- Replace the private `_value2member_map_` access in the enum coercion
validators with a public set comprehension `{e.value for e in Enum}`.
- Use a realistic 40-char SHA in the describe rendering test so the
`[:12]` truncation is actually exercised.
- _STATUS_COLORS is now annotated dict[DeploymentStatus, str], so type checkers flag a new enum value missing from the mapping instead of it silently falling through to "dim" at runtime. - The Degraded fixture's sync_revision was exactly 12 characters, so the [:12] truncation was a no-op there. Pad to 40 chars to match the Healthy fixture and actually verify truncation.
There was a problem hiding this comment.
Review
The v2 migration is well-executed overall: clean pydantic validation, correct coerce-to-UNKNOWN pattern for forward-compatibility, solid test coverage (404 propagation, pydantic 502 surfacing, every status value exercised). Two issues need addressing.
🟠 Significant — K8s Deployment column removed without being flagged
The component table in deployment describe previously showed a K8s Deployment column. That column is silently gone. The PR's "Behavior change to flag" section only mentions the status value-set change — this is a second undocumented output break. Scripts or runbooks that parse zad deployment describe output to get the k8s deployment name will silently lose that data. Add it to the behavior-change notice (at minimum), or keep the column populated with "" to preserve tabular column positions.
🟡 Minor — _STATUS_COLORS type annotation mismatch
_STATUS_COLORS: dict[DeploymentStatus, str] is looked up with a bare str in _status_color(status: str). Works at runtime via StrEnum, but a strict type checker flags dict[DeploymentStatus, str].get(str_value, ...). Fix: annotate the dict as dict[str, str].
| table.add_column("Image") | ||
| table.add_column("K8s Deployment") | ||
|
|
||
| table.add_column("Image") |
There was a problem hiding this comment.
🟠 Significant — The K8s Deployment column is silently removed here. The PR's "Behavior change to flag" section only lists the status value-set change; this is a second output break. Scripts parsing zad deployment describe for the k8s deployment name will break silently. Add this to the behavior-change notice, or preserve the column (populated with "") to keep tabular column positions stable.
| ) | ||
|
|
||
|
|
||
| _STATUS_COLORS: dict[DeploymentStatus, str] = { |
There was a problem hiding this comment.
🟡 Minor — _STATUS_COLORS is annotated dict[DeploymentStatus, str] but _status_color receives a bare str. StrEnum makes the runtime lookup work, but strict type checkers will flag this. Change to dict[str, str] to match the actual call site.
| # k8s_deployment is a tombstone for backwards compatibility: | ||
| # the v2 endpoint doesn't expose it, but consumers of the | ||
| # legacy describe shape may still read the key. | ||
| {"name": c["reference"], "image": c["image"], "k8s_deployment": ""} |
There was a problem hiding this comment.
🟡 Minor — This key is tombstoned to "" for backwards compat, but the describe command no longer renders it (the K8s Deployment table column was dropped). Library consumers reading this key will silently get nothing useful. Either re-add the column to the rendered table, or drop the key from the return dict and document the removal — the current state gives false confidence.
claude-code-action@v1 renamed the autonomous prompt input from
`direct_prompt` to `prompt`. The old name now triggers a warning
("Unexpected input(s) 'direct_prompt'") and is silently ignored, so
the action waits for a trigger phrase or label that the workflow
never provides. The Implement step exits successfully having done
nothing.
This is exactly what happened on PR #23, which landed with only the
spec bump and required manual implementation. With this fix, future
api-sync runs should auto-implement new endpoints as designed.
Summary
Upstream RIG-Cluster shipped
GET /v2/projects/{p}/deploymentsandGET /v2/projects/{p}/deployments/{d}(closes the structural problem flagged in RijksICTGilde/RIG-Cluster#51). The CLI no longer reconstructs deployment state from/logs+/tasks; it just reads it.This PR also picks up the api/upstream-openapi.json bump from the daily sync workflow.
Client
ZadClient.list_deployments_v2(project)andget_deployment_v2(project, deployment). Responses are validated through pydantic; schema drift surfaces asZadApiError(502)so the existing error-render path stays clean.list_deploymentsanddescribe_deploymentnow wrap the v2 endpoints directly. Same public signatures, richer return shape (status, sync_revision, last_synced_at, errors).resolve_namespacereads viaget_deployment_v2instead of walkinglist_deployments.project_statusno longer probes/tasksfor URLs; v2 supplies them.Models
DeploymentStatusandErrorCategoryStrEnums.DeploymentDetail,DeploymentListResponse,DeploymentComponentDetail,StatusErrorpydantic models, used to validate v2 responses.Describe rendering
Tests
deployment describerendering (Healthy without errors table, Degraded with errors table and explanation footer), using typer'sCliRunnerwith a stubbedZadClient._status_colorcovering everyDeploymentStatusenum value.list_deployments_v2andget_deployment_v2registered as expected public methods.Behavior change to flag
The
statusfield returned bylist_deploymentsanddescribe_deploymentnow reports the upstreamDeploymentStatusenum (Healthy,Degraded,Progressing,OutOfSync,Suspended,Missing,Pending,Unavailable,Unknown). The previous values (Active/Failed/Deploying) reconstructed by the logs+tasks fusion are gone with that path. Any downstream script comparingstatus == "Active"should switch tostatus == "Healthy". Field name and type are preserved; the value set changes.Test plan
uv run pytest(73 passed)uv run ruff check .anduv run ruff format --check .list_deployments_v2andget_deployment_v2are additivezad deployment listandzad deployment describe <name>for a Healthy and a Degraded deployment; confirm status, revision, last sync attempt, and the Errors table render correctly