Skip to content

Commit 3edc4a4

Browse files
uittenbroekrobbertanneschuth
authored andcommitted
docs(errors): reframe messaging around clarity and solutions
Drop the defensive "not your request" / "not you" asides from error headlines and docs. The neutral Source: line already says where to look, so headlines lead with what's wrong and what to do — not with deflecting blame. Same behaviour, friendlier and more useful tone.
1 parent f42d7db commit 3edc4a4

4 files changed

Lines changed: 28 additions & 30 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,8 @@ def verb(
158158

159159
### Error reporting (the diagnosis layer)
160160

161-
Errors must be **honest about where the fault lives** — never make a user-app or
162-
user-input failure look like the platform is broken. The machinery lives in
163-
`api/errors.py`:
161+
Errors must give **clarity and a next step**: say what's wrong, point neutrally at
162+
where to look, and suggest the fix. The machinery lives in `api/errors.py`:
164163

165164
- **`Fault`** (StrEnum): `USER_INPUT`, `USER_APP`, `USER_CONFIG`, `AUTH`, `PLATFORM`,
166165
`NETWORK`, `UNKNOWN`. Drives a neutral source label (`FAULT_SOURCE`), color

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -74,10 +74,10 @@ zad metrics overview --output json | jq '.cpu_usage'
7474

7575
## Errors & exit codes
7676

77-
Errors tell you **where the fault lives** — your request, your application, your
78-
configuration, your credentials, or the ZAD platform — instead of a bare HTTP code.
79-
A failed image pull is labelled `Source: your application (cluster runtime)`, not
80-
"the backend is down".
77+
Errors tell you **what's wrong and what to do next**, with a neutral label for where
78+
to look — your request, your application, your configuration, your credentials, or the
79+
ZAD platform — instead of a bare HTTP code. A failed image pull points you straight at
80+
the image and registry (`Source: your application (cluster runtime)`) with the fix.
8181

8282
Each error carries a structured diagnosis. In `--output json` it's a single object
8383
on stdout you can branch on in CI/CD:

src/zad_cli/api/client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def _parse_v2_response(model_cls: type, payload: Any) -> dict:
6161
f"Unexpected API response shape for {model_cls.__name__}: {e}",
6262
diagnosis=Diagnosis(
6363
fault=Fault.PLATFORM,
64-
headline="ZAD returned a response this CLI couldn't read — a platform/version mismatch, not you.",
64+
headline="ZAD returned a response this CLI couldn't read — likely a CLI/API version mismatch.",
6565
summary=f"Schema {model_cls.__name__} failed to validate.",
6666
next_steps=[
6767
"Retry shortly (exit code 2 = transient).",

src/zad_cli/api/errors.py

Lines changed: 21 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,20 @@
1-
"""Honest, source-labelled diagnosis of API and task failures.
2-
3-
The upstream API already attributes failures accurately: ``ErrorCategory`` on
4-
cluster errors, ``ComponentFailureInfo`` (with log tails) on failed deployment
5-
tasks, ``HTTPValidationError`` on bad input, and ``error_type`` on task results.
6-
The CLI used to collapse all of that into a bare ``HTTP 500`` / ``Task failed``
7-
string, which made every failure look like the platform was broken.
8-
9-
This module turns those raw signals into a :class:`Diagnosis`: a clear,
10-
**source-labelled** headline ("Source: your application"), the concrete message,
11-
the backend's own explanation, and a next step. The fault vocabulary is kept in
12-
lockstep with the OpenAPI spec by ``tests/test_spec_conformance.py`` (strict
13-
coupling: drift fails CI) while runtime parsing degrades gracefully on unknown
14-
values (loose coupling).
15-
16-
Honesty rule: never claim more certainty than the data supports. When the API
17-
gives no category, the fault is ``UNKNOWN`` and we point at the logs rather than
18-
guessing whose fault it is.
1+
"""Clear, actionable diagnosis of API and task failures.
2+
3+
The goal is simple: tell the user *what went wrong and what to do next*. The
4+
upstream API already carries the signal for that — ``ErrorCategory`` on cluster
5+
errors, ``ComponentFailureInfo`` (with log tails) on failed deployment tasks,
6+
``HTTPValidationError`` on bad input, ``error_type`` on task results — but a bare
7+
``HTTP 500`` / ``Task failed`` string throws it away.
8+
9+
This module turns those raw signals into a :class:`Diagnosis`: a plain-language
10+
headline, a neutral source label so you know where to look ("Source: your
11+
application"), the concrete message, the backend's own explanation, and a next
12+
step. The fault vocabulary is kept in lockstep with the OpenAPI spec by
13+
``tests/test_spec_conformance.py`` (strict coupling: drift fails CI) while runtime
14+
parsing degrades gracefully on unknown values (loose coupling).
15+
16+
We never claim more certainty than the data supports: when the API gives no
17+
category, the fault is ``UNKNOWN`` and we point at the logs rather than guessing.
1918
"""
2019

2120
from __future__ import annotations
@@ -288,7 +287,7 @@ def _http_headline(status_code: int, fault: Fault) -> tuple[str, list[str]]:
288287
)
289288
if fault is Fault.PLATFORM:
290289
return (
291-
f"ZAD had an internal error (HTTP {status_code}) — this is the platform, not your request.",
290+
f"ZAD platform error (HTTP {status_code}) — usually transient.",
292291
["Retry shortly (exit code 2 = transient). If it persists, report it with the time of the call."],
293292
)
294293
return (f"Request rejected (HTTP {status_code}).", [])
@@ -335,13 +334,13 @@ def diagnose_task_failure(error_message: str | None, result: object) -> Diagnosi
335334
)
336335

337336
if fault is Fault.USER_APP:
338-
headline = "Your application failed to run on the cluster — ZAD applied your config, the workload didn't start."
337+
headline = "Your application didn't start on the cluster (the deploy reached the cluster; the workload failed)."
339338
next_steps.append("Inspect `zad logs -d <deployment>` and `zad deployment describe <deployment>`.")
340339
elif fault is Fault.USER_CONFIG:
341-
headline = "ZAD could not apply your configuration."
340+
headline = "Your configuration couldn't be applied."
342341
next_steps.append("Fix your git repo/manifests, then `zad deployment refresh`.")
343342
else:
344-
headline = "The operation failed, and ZAD did not report a category."
343+
headline = "The operation failed. Check the details below for the cause."
345344
next_steps.append("Run `zad task status <id>` and `zad logs` for the full output.")
346345

347346
return Diagnosis(fault=fault, headline=headline, summary=summary, details=details, next_steps=next_steps)

0 commit comments

Comments
 (0)