Skip to content

Commit c8bcb51

Browse files
committed
fix: verwerk review-opmerkingen op de diagnose-laag
- README: json-diagnose gaat naar stdout, dus > out.json i.p.v. 2>err.json - UNKNOWN krijgt eigen exit code 3 (niet 1 'jouw fout', niet 2 'veilig retryen') zodat de exit code de honesty-lijn volgt; README- en CLAUDE-tabel bijgewerkt - spec-conformance: ontbrekend specbestand faalt hard i.p.v. stil te skippen - verwijder em-dashes uit gebruikersgerichte foutmeldingen en docstrings
1 parent 3edc4a4 commit c8bcb51

7 files changed

Lines changed: 38 additions & 27 deletions

File tree

CLAUDE.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ where to look, and suggest the fix. The machinery lives in `api/errors.py`:
164164
- **`Fault`** (StrEnum): `USER_INPUT`, `USER_APP`, `USER_CONFIG`, `AUTH`, `PLATFORM`,
165165
`NETWORK`, `UNKNOWN`. Drives a neutral source label (`FAULT_SOURCE`), color
166166
(`FAULT_COLOR`), and CI/CD exit code (`FAULT_EXIT_CODE`: 1 = your fault, 2 =
167-
platform/transient).
167+
platform/transient, 3 = unknown/unattributable).
168168
- **`Diagnosis`** (dataclass): `fault`, `headline`, `summary`, `details`, `next_steps`,
169169
`status_code`. `to_dict()` is the json contract (CI branches on `fault`).
170170
- **`diagnose_http_error`** parses status codes and FastAPI `422` validation arrays;
@@ -181,12 +181,12 @@ Rules for new code:
181181
- After any mutating op, call `surface_warnings(ctx, formatter, result)` so warnings /
182182
unhealthy components are surfaced (and `--strict` can fail CI).
183183
- **Honesty:** when the API gives no category, the fault is `UNKNOWN` and we point at
184-
the logs don't guess whose fault it is.
184+
the logs (exit code 3); don't guess whose fault it is.
185185

186186
**Spec coupling:** `CATEGORY_FAULT` / `CATEGORY_HINT` are keyed by `ErrorCategory`, and
187187
`tests/test_spec_conformance.py` asserts the enum matches `api/upstream-openapi.json` and
188188
that every category is mapped. When the api-sync workflow surfaces a new `ErrorCategory`,
189-
add it to `models.ErrorCategory` **and** both maps the conformance test tells you.
189+
add it to `models.ErrorCategory` **and** both maps; the conformance test tells you.
190190

191191
### Client method conventions
192192

README.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,15 +75,15 @@ zad metrics overview --output json | jq '.cpu_usage'
7575
## Errors & exit codes
7676

7777
Errors tell you **what's wrong and what to do next**, with a neutral label for where
78-
to look your request, your application, your configuration, your credentials, or the
79-
ZAD platform instead of a bare HTTP code. A failed image pull points you straight at
78+
to look (your request, your application, your configuration, your credentials, or the
79+
ZAD platform) instead of a bare HTTP code. A failed image pull points you straight at
8080
the image and registry (`Source: your application (cluster runtime)`) with the fix.
8181

8282
Each error carries a structured diagnosis. In `--output json` it's a single object
8383
on stdout you can branch on in CI/CD:
8484

8585
```bash
86-
zad deployment create app -c web=img:tag -o json 2>err.json || jq -r .fault err.json
86+
zad deployment create app -c web=img:tag -o json > out.json || jq -r .fault out.json
8787
# UserInput | UserApp | UserConfig | Auth | Platform | Network | Unknown
8888
```
8989

@@ -92,8 +92,9 @@ Exit codes:
9292
| Code | Meaning |
9393
|------|---------|
9494
| `0` | success |
95-
| `1` | your fault — fix it (bad input, app/config failure, auth) |
96-
| `2` | platform/network — transient, safe to retry |
95+
| `1` | your fault, fix it (bad input, app/config failure, auth) |
96+
| `2` | platform/network, transient and safe to retry |
97+
| `3` | unknown, the API gave no signal to attribute the failure (check the logs) |
9798

9899
`--strict` makes a command that *succeeds but reports warnings* (e.g. the deploy
99100
applied but a component is crash-looping) exit non-zero, so a pipeline fails the

src/zad_cli/api/client.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,11 +61,11 @@ def _parse_v2_response(model_cls: type, payload: Any) -> dict:
6161
f"Unexpected API response shape for {model_cls.__name__}: {e}",
6262
diagnosis=Diagnosis(
6363
fault=Fault.PLATFORM,
64-
headline="ZAD returned a response this CLI couldn't read likely a CLI/API version mismatch.",
64+
headline="ZAD returned a response this CLI couldn't read; likely a CLI/API version mismatch.",
6565
summary=f"Schema {model_cls.__name__} failed to validate.",
6666
next_steps=[
6767
"Retry shortly (exit code 2 = transient).",
68-
"If it persists, the CLI may be out of date update it or report the mismatch.",
68+
"If it persists, the CLI may be out of date; update it or report the mismatch.",
6969
],
7070
status_code=502,
7171
),
@@ -248,7 +248,7 @@ def _poll_task(self, poll_url: str) -> dict:
248248
task_id=task_id,
249249
diagnosis=Diagnosis(
250250
fault=Fault.UNKNOWN,
251-
headline=f"Timed out after {self.task_timeout}s waiting for the task it may still be running.",
251+
headline=f"Timed out after {self.task_timeout}s waiting for the task; it may still be running.",
252252
next_steps=["This is a wait limit, not a failure. Check `zad task status <id>`."],
253253
),
254254
)

src/zad_cli/api/errors.py

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
"""Clear, actionable diagnosis of API and task failures.
22
33
The goal is simple: tell the user *what went wrong and what to do next*. The
4-
upstream API already carries the signal for that ``ErrorCategory`` on cluster
5-
errors, ``ComponentFailureInfo`` (with log tails) on failed deployment tasks,
6-
``HTTPValidationError`` on bad input, ``error_type`` on task results but a bare
4+
upstream API already carries the signal for that (``ErrorCategory`` on cluster
5+
errors, ``ComponentFailureInfo`` with log tails on failed deployment tasks,
6+
``HTTPValidationError`` on bad input, ``error_type`` on task results), but a bare
77
``HTTP 500`` / ``Task failed`` string throws it away.
88
99
This module turns those raw signals into a :class:`Diagnosis`: a plain-language
@@ -39,15 +39,15 @@ class Fault(StrEnum):
3939
UNKNOWN = "Unknown" # not enough signal to attribute honestly
4040

4141

42-
# Neutral, source-labelled phrasing (no blame just where the fault lives).
42+
# Neutral, source-labelled phrasing (no blame, just where the fault lives).
4343
FAULT_SOURCE: dict[Fault, str] = {
4444
Fault.USER_INPUT: "your request",
4545
Fault.USER_APP: "your application (cluster runtime)",
4646
Fault.USER_CONFIG: "your configuration / git",
4747
Fault.AUTH: "your credentials / permissions",
4848
Fault.PLATFORM: "ZAD platform",
4949
Fault.NETWORK: "network / connectivity",
50-
Fault.UNKNOWN: "unknown see logs",
50+
Fault.UNKNOWN: "unknown (see logs)",
5151
}
5252

5353
# Rich color: user-fixable = yellow, escalate/investigate = red, auth = magenta.
@@ -61,15 +61,17 @@ class Fault(StrEnum):
6161
Fault.UNKNOWN: "red",
6262
}
6363

64-
# CI/CD exit codes: 1 = your fault (fix it), 2 = platform/transient (safe to retry).
64+
# CI/CD exit codes: 1 = your fault (fix it), 2 = platform/transient (safe to retry),
65+
# 3 = unattributable. UNKNOWN gets its own code rather than claiming "your fault"
66+
# (1) or "safe to retry" (2) when the API gave us no signal to attribute the failure.
6567
FAULT_EXIT_CODE: dict[Fault, int] = {
6668
Fault.USER_INPUT: 1,
6769
Fault.USER_APP: 1,
6870
Fault.USER_CONFIG: 1,
6971
Fault.AUTH: 1,
7072
Fault.PLATFORM: 2,
7173
Fault.NETWORK: 2,
72-
Fault.UNKNOWN: 1,
74+
Fault.UNKNOWN: 3,
7375
}
7476

7577
# Which fault each cluster ErrorCategory implies. Keyed by ErrorCategory so the
@@ -272,22 +274,22 @@ def _http_headline(status_code: int, fault: Fault) -> tuple[str, list[str]]:
272274
)
273275
if status_code == 404:
274276
return (
275-
"Not found (HTTP 404) the resource you referenced doesn't exist.",
277+
"Not found (HTTP 404): the resource you referenced doesn't exist.",
276278
["Check the name/spelling and that it exists (e.g. `zad deployment list`)."],
277279
)
278280
if status_code == 409:
279281
return (
280-
"Conflict (HTTP 409) the resource is in a state that blocks this action.",
282+
"Conflict (HTTP 409): the resource is in a state that blocks this action.",
281283
["Check its current state, then retry once it settles."],
282284
)
283285
if status_code == 422:
284286
return (
285-
"Invalid request (HTTP 422) the values you sent didn't pass validation.",
287+
"Invalid request (HTTP 422): the values you sent didn't pass validation.",
286288
["Fix the field(s) listed above and retry."],
287289
)
288290
if fault is Fault.PLATFORM:
289291
return (
290-
f"ZAD platform error (HTTP {status_code}) usually transient.",
292+
f"ZAD platform error (HTTP {status_code}), usually transient.",
291293
["Retry shortly (exit code 2 = transient). If it persists, report it with the time of the call."],
292294
)
293295
return (f"Request rejected (HTTP {status_code}).", [])
@@ -370,7 +372,7 @@ def degraded_diagnoses(result: object) -> list[Diagnosis]:
370372
fault=Fault.USER_CONFIG,
371373
headline="The operation succeeded with warnings.",
372374
details=[str(w) for w in warnings],
373-
next_steps=["Review the warnings above they usually point at your configuration."],
375+
next_steps=["Review the warnings above; they usually point at your configuration."],
374376
)
375377
)
376378

src/zad_cli/api/models.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ class ProcessingStatus(BaseModel):
182182
"""Processing-step status inside a task result.
183183
184184
Mirrors the upstream task-models ``ProcessingStatus`` schema (the richer of
185-
the two same-named schemas — it carries ``component_failures``).
185+
the two same-named schemas, the one that carries ``component_failures``).
186186
"""
187187

188188
status: str

tests/test_errors.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,8 @@ def test_task_failure_syncfailed_text_is_user_config() -> None:
7979
def test_task_failure_unknown_stays_unknown() -> None:
8080
d = diagnose_task_failure("something odd happened", {})
8181
assert d.fault is Fault.UNKNOWN
82-
assert d.exit_code == 1
82+
# UNKNOWN gets its own exit code: not "your fault" (1), not "safe to retry" (2).
83+
assert d.exit_code == 3
8384
assert "logs" in " ".join(d.next_steps).lower()
8485

8586

tests/test_spec_conformance.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,12 @@
33
These assert the CLI's hand-written enums/mappings match the vendored OpenAPI spec
44
(api/upstream-openapi.json). When the daily api-sync workflow refreshes the spec and
55
the upstream adds/changes an ErrorCategory, these tests fail loudly with a clear
6-
instruction turning a silent drift into an actionable PR. This is the *strict* half
6+
instruction, turning a silent drift into an actionable PR. This is the *strict* half
77
of the coupling; runtime parsing (category_of / _coerce_unknown_*) is the *loose* half.
8+
9+
The vendored spec is part of the repo, so a missing file is a hard failure, not a
10+
skip: skipping would let the whole conformance check pass silently if the file is ever
11+
moved or renamed.
812
"""
913

1014
import json
@@ -20,7 +24,10 @@
2024

2125
def _spec_schemas() -> dict:
2226
if not _SPEC_PATH.exists():
23-
pytest.skip(f"vendored spec not found at {_SPEC_PATH}")
27+
pytest.fail(
28+
f"vendored spec not found at {_SPEC_PATH}. It is part of the repo and these "
29+
"conformance tests depend on it; if it moved, update _SPEC_PATH instead of skipping."
30+
)
2431
return json.loads(_SPEC_PATH.read_text())["components"]["schemas"]
2532

2633

0 commit comments

Comments
 (0)