-
Notifications
You must be signed in to change notification settings - Fork 28
docs: add SSA patterns, error handling, and troubleshooting enhancements #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,167 @@ | ||
| # Error Handling | ||
|
|
||
| Errors in kube originate from multiple layers. Understanding where each error comes from and how to handle it is key to building resilient controllers. | ||
|
|
||
| ## Error Layers | ||
|
|
||
| ```mermaid | ||
| graph TD | ||
| A["Client::send()"] -->|"network / TLS / timeout"| E1["kube::Error::HyperError\nkube::Error::HttpError"] | ||
| B["Api::list() / get() / patch()"] -->|"4xx / 5xx"| E2["kube::Error::Api"] | ||
| B -->|"deserialization failure"| E3["kube::Error::SerializationError"] | ||
| C["watcher()"] -->|"initial LIST failed"| E4["watcher::Error::InitialListFailed"] | ||
| C -->|"WATCH connect failed"| E5["watcher::Error::WatchFailed"] | ||
| C -->|"server error during WATCH"| E6["watcher::Error::WatchError"] | ||
| D["Controller::run()"] -->|"trigger stream"| C | ||
| D -->|"user code"| E7["reconciler Error"] | ||
|
|
||
| style E1 fill:#ffebee | ||
| style E2 fill:#ffebee | ||
| style E7 fill:#fff3e0 | ||
| ``` | ||
|
|
||
| | Layer | Error type | Typical cause | | ||
| |-------|-----------|---------------| | ||
| | Client | `HyperError`, `HttpError` | Network, TLS, timeout | | ||
| | [Api] | `Error::Api { status }` | Kubernetes 4xx/5xx response | | ||
| | [Api] | `SerializationError` | JSON deserialization failure | | ||
| | [watcher] | `InitialListFailed` | Initial LIST call failed | | ||
| | [watcher] | `WatchFailed` | WATCH connection failed | | ||
| | [watcher] | `WatchError` | Server error during WATCH (e.g. 410 Gone) | | ||
| | [Controller] | reconciler Error | Error from user code | | ||
|
|
||
| ## Watcher Errors and Backoff | ||
|
|
||
| Watcher errors are **soft errors** — the [watcher] retries on all failures (including 403s, network issues) because external circumstances may improve. They should never be silently discarded. See the [troubleshooting page](../troubleshooting.md#watcher-errors) for diagnostic examples. | ||
|
|
||
| The critical requirement is attaching a backoff to the watcher stream: | ||
|
|
||
| ```rust | ||
| // ✗ First error terminates the stream → controller stops | ||
| let stream = watcher(api, wc); | ||
|
clux marked this conversation as resolved.
Outdated
|
||
|
|
||
| // ✓ Exponential backoff with automatic retry | ||
| let stream = watcher(api, wc).default_backoff(); | ||
| ``` | ||
|
|
||
| ### default_backoff | ||
|
|
||
| Applies an `ExponentialBackoff`: 800ms → 1.6s → 3.2s → ... → 30s (max). The backoff resets whenever a successful event is received. | ||
|
|
||
| ### Custom backoff | ||
|
|
||
| ```rust | ||
| use backon::ExponentialBuilder; | ||
|
|
||
| let stream = watcher(api, wc).backoff( | ||
| ExponentialBuilder::default() | ||
| .with_min_delay(Duration::from_millis(500)) | ||
| .with_max_delay(Duration::from_secs(30)), | ||
| ); | ||
| ``` | ||
|
|
||
| ## Reconciler Errors | ||
|
|
||
| ### Defining error types | ||
|
|
||
| [Controller::run] requires specific trait bounds on the error type, so `anyhow::Error` cannot be used directly. Define a concrete error type with [thiserror]: | ||
|
|
||
| ```rust | ||
| #[derive(Debug, thiserror::Error)] | ||
| enum Error { | ||
| #[error("Kubernetes API error: {0}")] | ||
| KubeApi(#[from] kube::Error), | ||
|
|
||
| #[error("Missing spec field: {0}")] | ||
| MissingField(String), | ||
|
|
||
| #[error("External service error: {0}")] | ||
| External(String), | ||
| } | ||
| ``` | ||
|
|
||
| ### error_policy | ||
|
|
||
| When the reconciler returns `Err`, the `error_policy` function decides what happens next: | ||
|
|
||
| ```rust | ||
| fn error_policy(obj: Arc<MyResource>, err: &Error, ctx: Arc<Context>) -> Action { | ||
| tracing::error!(?err, "reconcile failed"); | ||
| Action::requeue(Duration::from_secs(5)) | ||
| } | ||
| ``` | ||
|
|
||
| You can distinguish transient from permanent errors: | ||
|
|
||
| | Type | Examples | Handling | | ||
| |------|----------|---------| | ||
| | Transient | Network error, timeout, 429 | Requeue via `error_policy` | | ||
| | Permanent | Invalid spec, bad config | Record condition on status + `Action::await_change()` | | ||
|
|
||
| ```rust | ||
| fn error_policy(obj: Arc<MyResource>, err: &Error, ctx: Arc<Context>) -> Action { | ||
| match err { | ||
| // Transient: retry | ||
| Error::KubeApi(_) | Error::External(_) => { | ||
| Action::requeue(Duration::from_secs(5)) | ||
| } | ||
| // Permanent: don't retry until the object changes | ||
| Error::MissingField(_) => Action::await_change(), | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| !!! note "Current limitations" | ||
|
|
||
| `error_policy` is a **synchronous** function. You cannot perform async operations (sending metrics, updating status) inside it. For per-key exponential backoff, wrap the reconciler itself — see the pattern described in the [[reconciler]] documentation. | ||
|
|
||
|
Comment on lines
+114
to
+117
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, this is a great callout.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. actually looking at this some more, there's no such pattern described in the reconciler documentation. |
||
| ## Client-level Retry | ||
|
|
||
| kube-client does not include built-in retry for regular API calls. If a `create()`, `patch()`, or `get()` fails, the error is returned as-is. | ||
|
|
||
| For automatic retry, you can use [tower]'s retry middleware. However, not all errors are retryable: | ||
|
clux marked this conversation as resolved.
Outdated
|
||
|
|
||
| | Error | Retryable | Reason | | ||
| |-------|-----------|--------| | ||
| | 5xx | Yes | Server-side transient failure | | ||
| | Timeout | Yes | Temporary network issue | | ||
| | 429 Too Many Requests | Yes | Rate limit — wait and retry | | ||
| | Network error | Yes | Temporary connectivity failure | | ||
| | 4xx (400, 403, 404) | No | The request itself is wrong | | ||
| | 409 Conflict | No | SSA ownership conflict — fix the logic | | ||
|
|
||
| ## Timeout Strategy | ||
|
|
||
| The default `read_timeout` on [Client] is 295 seconds (matching the Kubernetes server-side watch timeout). This means a regular [Api] call could block for nearly 5 minutes if the server is unresponsive. | ||
|
|
||
| ### Separate clients | ||
|
|
||
| ```rust | ||
| // Watcher client (default 295s timeout — needed for watch) | ||
| let watcher_client = Client::try_default().await?; | ||
|
|
||
| // API call client (short timeout) | ||
| let mut config = Config::infer().await?; | ||
| config.read_timeout = Some(Duration::from_secs(15)); | ||
| let api_client = Client::try_from(config)?; | ||
| ``` | ||
|
clux marked this conversation as resolved.
Outdated
|
||
|
|
||
| ### Wrapping individual calls | ||
|
|
||
| ```rust | ||
| let pod = tokio::time::timeout( | ||
| Duration::from_secs(10), | ||
| api.get("my-pod"), | ||
| ).await??; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. double questionmark |
||
| ``` | ||
|
|
||
| ### Controllers | ||
|
|
||
| In a [Controller] context, the watcher needs the long timeout. Only the API calls inside your reconciler need shorter timeouts. Wrapping individual reconciler calls with `tokio::time::timeout` is usually sufficient. | ||
|
clux marked this conversation as resolved.
Outdated
|
||
|
|
||
| --8<-- "includes/abbreviations.md" | ||
| --8<-- "includes/links.md" | ||
|
|
||
| [//begin]: # "Autogenerated link references for markdown compatibility" | ||
| [reconciler]: reconciler "The Reconciler" | ||
| [//end]: # "Autogenerated link references" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,177 @@ | ||
| # Server-Side Apply | ||
|
|
||
| [Server-Side Apply] is a Kubernetes patch strategy based on field ownership. It allows multiple controllers to safely modify the same resource by tracking which controller owns which fields. | ||
|
|
||
| This page covers practical patterns, common pitfalls, and status patching with SSA in kube. | ||
|
|
||
| !!! note "SSA and Reconciler Idempotency" | ||
|
|
||
| SSA naturally fits the [[reconciler]]'s idempotent pattern: you declare "these fields should have these values", and the server handles the rest. See [[reconciler#in-depth-solution]] for how SSA simplifies reconciler logic. | ||
|
|
||
| ## Why SSA | ||
|
|
||
| The traditional patch strategies each have limitations: | ||
|
|
||
| | Strategy | Limitation | | ||
| |----------|-----------| | ||
| | Merge patch | Overwrites entire arrays. Field deletion is not explicit | | ||
| | Strategic merge patch | Only works with k8s-openapi types. Incomplete for CRDs | | ||
| | JSON patch | Requires exact paths. Susceptible to race conditions | | ||
|
|
||
| SSA addresses these: | ||
|
|
||
| - **Field ownership**: the server records "this controller owns this field" | ||
| - **Conflict detection**: touching another owner's field produces a `409 Conflict` | ||
| - **Declarative**: you declare which fields should have which values; everything else is left untouched | ||
|
|
||
| ## Basic Pattern | ||
|
|
||
| ```rust | ||
| use kube::api::{Patch, PatchParams}; | ||
|
|
||
| let patch = Patch::Apply(serde_json::json!({ | ||
| "apiVersion": "v1", | ||
| "kind": "ConfigMap", | ||
| "metadata": { "name": "my-cm" }, | ||
| "data": { "key": "value" } | ||
| })); | ||
| let pp = PatchParams::apply("my-controller"); // field manager name | ||
| api.patch("my-cm", &pp, &patch).await?; | ||
| ``` | ||
|
|
||
| The `"my-controller"` string in `PatchParams::apply` is the **field manager** name. Ownership is tracked under this name. Applying again with the same field manager updates owned fields; fields owned by other managers are left alone. | ||
|
|
||
| ## Common Pitfalls | ||
|
|
||
| ### Missing apiVersion and kind | ||
|
|
||
| ```rust | ||
| // ✗ 400 Bad Request | ||
| let patch = Patch::Apply(serde_json::json!({ | ||
| "data": { "key": "value" } | ||
| })); | ||
|
|
||
| // ✓ apiVersion and kind are required | ||
| let patch = Patch::Apply(serde_json::json!({ | ||
| "apiVersion": "v1", | ||
| "kind": "ConfigMap", | ||
| "metadata": { "name": "my-cm" }, | ||
| "data": { "key": "value" } | ||
| })); | ||
| ``` | ||
|
|
||
| Unlike merge patch, SSA requires `apiVersion` and `kind` in every request. | ||
|
|
||
| ### Missing field manager | ||
|
|
||
| ```rust | ||
| // ✗ Uses default field manager → unintended ownership conflicts | ||
| let pp = PatchParams::default(); | ||
|
|
||
| // ✓ Explicit field manager | ||
| let pp = PatchParams::apply("my-controller"); | ||
| ``` | ||
|
Comment on lines
+67
to
+73
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. field managers are required for serverside apply so using PatchParams::default with apply should probably be validated as an error in PatchParams rather than documented here as an eternal footgun.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed — this should be a client-side validation rather than a doc-only warning. |
||
|
|
||
| Always specify an explicit field manager. Without one, you risk ownership collisions with other controllers or kubectl users. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the implication of collisions if you are not using a field manager? |
||
|
|
||
| ### Overusing force | ||
|
|
||
| ```rust | ||
| // Caution: forcibly takes ownership of fields from other managers | ||
| let pp = PatchParams::apply("my-controller").force(); | ||
| ``` | ||
|
|
||
| `force: true` takes ownership of fields from other controllers. Only use this in single-owner situations such as CRD registration. | ||
|
|
||
| ### Including unnecessary fields | ||
|
|
||
| Serializing an entire Rust struct includes `Default` value fields. SSA takes ownership of those fields, causing conflicts when another controller tries to modify them. | ||
|
|
||
| ```rust | ||
| // ✗ Serializes all Default fields → unnecessary ownership | ||
| let full_deployment = Deployment { ..Default::default() }; | ||
|
|
||
| // ✓ Only include fields you actually manage | ||
| let patch = serde_json::json!({ | ||
| "apiVersion": "apps/v1", | ||
| "kind": "Deployment", | ||
| "metadata": { "name": "my-deploy" }, | ||
| "spec": { | ||
| "replicas": 3 | ||
| } | ||
| }); | ||
| ``` | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. part of this problem tangled up in kubernetes objects not being fully optional e.g. some replicas ints. i believe this is what drove them to later make ApplyConfigurations variants where everything is truly optional. rust unfortunately does not have an equivalent of applyconfigurations kube-rs/kube#649 so serializing parts of structs can on some occasions be annoying if you want to use a typed interface for partial SSA. possibly this is worth a current limitations callout somewhere 🤔 |
||
| ## Status Patching | ||
|
|
||
| Status is modified through the `/status` subresource: | ||
|
|
||
| ```rust | ||
| let status_patch = serde_json::json!({ | ||
| "apiVersion": "example.com/v1", | ||
| "kind": "MyResource", | ||
| "status": { | ||
| "phase": "Ready", | ||
| "conditions": [{ | ||
| "type": "Available", | ||
| "status": "True", | ||
| "lastTransitionTime": "2024-01-01T00:00:00Z", | ||
| }] | ||
| } | ||
| }); | ||
| let pp = PatchParams::apply("my-controller"); | ||
| api.patch_status("name", &pp, &Patch::Apply(status_patch)).await?; | ||
| ``` | ||
|
|
||
| !!! warning "Wrap status in the full object structure" | ||
|
|
||
| ```rust | ||
| // ✗ Sending just the status fields will fail | ||
| serde_json::json!({ "phase": "Ready" }) | ||
|
|
||
| // ✓ Must include apiVersion, kind, and wrap under "status" | ||
| serde_json::json!({ | ||
| "apiVersion": "example.com/v1", | ||
| "kind": "MyResource", | ||
| "status": { "phase": "Ready" } | ||
| }) | ||
| ``` | ||
|
|
||
| The Kubernetes API expects the full object structure even on the `/status` endpoint. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah. good callout. this is definitely still a footgun. i wish this could be typed better using default None and builders; let updated = instance.status(MyCrdStatus::default().phase("Ready".into());
client.patch_status<MyCrd>(updated, ssa).await?and have the patch_status strip the spec for users. but it relies on builders (e.g. [k8s-pb](kube-rs/k8s-pb#9, and fulll optionality like applyconfigurations). |
||
|
|
||
| ## Typed SSA | ||
|
|
||
| Instead of `serde_json::json!()`, you can use Rust types for type safety and IDE autocompletion: | ||
|
|
||
| ```rust | ||
| let cm = ConfigMap { | ||
| metadata: ObjectMeta { | ||
| name: Some("my-cm".into()), | ||
| ..Default::default() | ||
| }, | ||
| data: Some(BTreeMap::from([("key".into(), "value".into())])), | ||
| ..Default::default() | ||
| }; | ||
| let pp = PatchParams::apply("my-controller"); | ||
| api.patch("my-cm", &pp, &Patch::Apply(cm)).await?; | ||
| ``` | ||
|
|
||
| [k8s-openapi] types already have `#[serde(skip_serializing_if = "Option::is_none")]` applied, so `None` fields are omitted from serialization. For your own types, you need to add this explicitly: | ||
|
|
||
| ```rust | ||
| #[derive(Serialize)] | ||
| struct MyStatus { | ||
| phase: String, | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| message: Option<String>, | ||
| } | ||
| ``` | ||
|
|
||
| Without `skip_serializing_if`, `None` fields serialize as `null` and SSA takes ownership of them. | ||
|
clux marked this conversation as resolved.
|
||
|
|
||
| --8<-- "includes/abbreviations.md" | ||
| --8<-- "includes/links.md" | ||
|
|
||
| [//begin]: # "Autogenerated link references for markdown compatibility" | ||
| [reconciler]: reconciler "The Reconciler" | ||
| [//end]: # "Autogenerated link references" | ||
Uh oh!
There was an error while loading. Please reload this page.