-
Notifications
You must be signed in to change notification settings - Fork 28
docs: add SSA patterns, error handling, and troubleshooting enhancements #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| # Error Handling | ||
|
|
||
| Errors in kube originate from multiple layers. Understanding where each error comes from and how to handle it is key to building resilient controllers. | ||
|
|
||
| ## Error Layers | ||
|
|
||
| ```mermaid | ||
| graph TD | ||
| A["Client::send()"] -->|"network / TLS / timeout"| E1["kube::Error::HyperError\nkube::Error::HttpError"] | ||
| B["Api::list() / get() / patch()"] -->|"4xx / 5xx"| E2["kube::Error::Api"] | ||
| B -->|"deserialization failure"| E3["kube::Error::SerializationError"] | ||
| C["watcher()"] -->|"initial LIST failed"| E4["watcher::Error::InitialListFailed"] | ||
| C -->|"WATCH connect failed"| E5["watcher::Error::WatchFailed"] | ||
| C -->|"server error during WATCH"| E6["watcher::Error::WatchError"] | ||
| D["Controller::run()"] -->|"trigger stream"| C | ||
| D -->|"user code"| E7["reconciler Error"] | ||
|
|
||
| style E1 fill:#ffebee | ||
| style E2 fill:#ffebee | ||
| style E7 fill:#fff3e0 | ||
| ``` | ||
|
|
||
| | Layer | Error type | Typical cause | | ||
| |-------|-----------|---------------| | ||
| | Client | `HyperError`, `HttpError` | Network, TLS, timeout | | ||
| | [Api] | `Error::Api { status }` | Kubernetes 4xx/5xx response | | ||
| | [Api] | `SerializationError` | JSON deserialization failure | | ||
| | [watcher] | `InitialListFailed` | Initial LIST call failed | | ||
| | [watcher] | `WatchFailed` | WATCH connection failed | | ||
| | [watcher] | `WatchError` | Server error during WATCH (e.g. 410 Gone) | | ||
| | [Controller] | reconciler Error | Error from user code | | ||
|
|
||
| ## Watcher Errors and Backoff | ||
|
|
||
| Watcher errors are **soft errors** — the [watcher] retries on all failures (including 403s, network issues) because external circumstances may improve. They should never be **silently** discarded. See the [troubleshooting page](../troubleshooting.md#watcher-errors) for diagnostic examples. | ||
|
|
||
| The critical requirement is attaching a backoff to the watcher stream: | ||
|
|
||
| ```rust | ||
| // ✗ Without backoff, errors cause a tight retry loop | ||
| let stream = watcher(api, wc); | ||
|
|
||
| // ✓ Exponential backoff with automatic retry | ||
| let stream = watcher(api, wc).default_backoff(); | ||
| ``` | ||
|
|
||
| ### default_backoff | ||
|
|
||
| Applies an `ExponentialBackoff`: 800ms → 1.6s → 3.2s → ... → 30s (max). The backoff resets whenever a successful event is received. | ||
|
|
||
| ### Custom backoff | ||
|
|
||
| ```rust | ||
| use backon::ExponentialBuilder; | ||
|
|
||
| let stream = watcher(api, wc).backoff( | ||
| ExponentialBuilder::default() | ||
| .with_min_delay(Duration::from_millis(500)) | ||
| .with_max_delay(Duration::from_secs(30)), | ||
| ); | ||
| ``` | ||
|
|
||
| ## Reconciler Errors | ||
|
|
||
| ### Defining error types | ||
|
|
||
| [Controller::run] requires specific trait bounds on the error type, so `anyhow::Error` cannot be used directly. Define a concrete error type with [thiserror]: | ||
|
|
||
| ```rust | ||
| #[derive(Debug, thiserror::Error)] | ||
| enum Error { | ||
| #[error("Kubernetes API error: {0}")] | ||
| KubeApi(#[from] kube::Error), | ||
|
|
||
| #[error("Missing spec field: {0}")] | ||
| MissingField(String), | ||
|
|
||
| #[error("External service error: {0}")] | ||
| External(String), | ||
| } | ||
| ``` | ||
|
|
||
| ### error_policy | ||
|
|
||
| When the reconciler returns `Err`, the `error_policy` function decides what happens next: | ||
|
|
||
| ```rust | ||
| fn error_policy(obj: Arc<MyResource>, err: &Error, ctx: Arc<Context>) -> Action { | ||
| tracing::error!(?err, "reconcile failed"); | ||
| Action::requeue(Duration::from_secs(5)) | ||
| } | ||
| ``` | ||
|
|
||
| You can distinguish transient from permanent errors: | ||
|
|
||
| | Type | Examples | Handling | | ||
| |------|----------|---------| | ||
| | Transient | Network error, timeout, 429 | Requeue via `error_policy` | | ||
| | Permanent | Invalid spec, bad config | Record condition on status + `Action::await_change()` | | ||
|
|
||
| ```rust | ||
| fn error_policy(obj: Arc<MyResource>, err: &Error, ctx: Arc<Context>) -> Action { | ||
| match err { | ||
| // Transient: retry | ||
| Error::KubeApi(_) | Error::External(_) => { | ||
| Action::requeue(Duration::from_secs(5)) | ||
| } | ||
| // Permanent: don't retry until the object changes | ||
| Error::MissingField(_) => Action::await_change(), | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| !!! note "Current limitations" | ||
|
|
||
| `error_policy` is a **synchronous** function. You cannot perform async operations (sending metrics, updating status) inside it. For per-key exponential backoff, wrap the reconciler itself with a middleware that tracks per-object retry state. | ||
|
|
||
| ## Client-level Retry | ||
|
|
||
| By default, kube-client does not retry regular API calls. If a `create()`, `patch()`, or `get()` fails, the error is returned as-is. | ||
|
|
||
| Since version 3, kube provides a built-in [`RetryPolicy`](https://docs.rs/kube/latest/kube/client/retry/struct.RetryPolicy.html) that implements [tower]'s retry middleware. It retries on 429, 503, and 504 with exponential backoff: | ||
|
|
||
| ```rust | ||
| use kube::client::retry::RetryPolicy; | ||
| use tower::{ServiceBuilder, retry::RetryLayer, buffer::BufferLayer}; | ||
|
|
||
| let service = ServiceBuilder::new() | ||
| .layer(config.base_uri_layer()) | ||
| .option_layer(config.auth_layer()?) | ||
| .layer(BufferLayer::new(1024)) | ||
| .layer(RetryLayer::new(RetryPolicy::default())) | ||
| // ... | ||
| ``` | ||
|
|
||
| `RetryPolicy` specifically retries **429**, **503**, and **504** responses. It does not retry network errors or other 5xx codes. | ||
|
|
||
| For broader retry guidance when designing your own error handling: | ||
|
|
||
| | Error | Retryable | Where to handle | | ||
| |-------|-----------|-----------------| | ||
| | 429, 503, 504 | Yes | `RetryPolicy` handles automatically | | ||
| | Other 5xx | Depends | `error_policy` or custom Tower middleware | | ||
| | Timeout / Network | Yes | `error_policy` requeue, or watcher backoff | | ||
| | 4xx (400, 403, 404) | No | Fix the request or RBAC | | ||
| | 409 Conflict | No | SSA ownership conflict — fix field managers | | ||
|
|
||
| ## Timeout Strategy | ||
|
|
||
| If you need to guard against slow API calls in your reconciler, you can wrap individual calls with `tokio::time::timeout`: | ||
|
|
||
| ```rust | ||
| // First ? unwraps the timeout Result<T, Elapsed> | ||
| // Second ? unwraps the API Result<Pod, kube::Error> | ||
| let pod = tokio::time::timeout( | ||
| Duration::from_secs(10), | ||
| api.get("my-pod"), | ||
| ).await??; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. double questionmark |
||
| ``` | ||
|
|
||
| In a [Controller] context, stream timeouts rely internally on watcher timeouts and can be configured via stream backoff parameters and [watcher::Config]. Only individual API calls inside your reconciler typically need shorter timeouts. | ||
|
|
||
| --8<-- "includes/abbreviations.md" | ||
| --8<-- "includes/links.md" | ||
|
|
||
| [//begin]: # "Autogenerated link references for markdown compatibility" | ||
| [reconciler]: reconciler "The Reconciler" | ||
| [//end]: # "Autogenerated link references" | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| # Server-Side Apply | ||
|
|
||
| [Server-Side Apply] is a Kubernetes patch strategy based on field ownership. It allows multiple controllers to safely modify the same resource by tracking which controller owns which fields. | ||
|
|
||
| This page covers practical patterns, common pitfalls, and status patching with SSA in kube. | ||
|
|
||
| !!! note "SSA and Reconciler Idempotency" | ||
|
|
||
| SSA naturally fits the [[reconciler]]'s idempotent pattern: you declare "these fields should have these values", and the server handles the rest. See [[reconciler#in-depth-solution]] for how SSA simplifies reconciler logic. | ||
|
|
||
| ## Why SSA | ||
|
|
||
| The traditional patch strategies each have limitations: | ||
|
|
||
| | Strategy | Limitation | | ||
| |----------|-----------| | ||
| | Merge patch | Overwrites entire arrays. Field deletion is not explicit | | ||
| | Strategic merge patch | Only works with k8s-openapi types. Incomplete for CRDs | | ||
| | JSON patch | Requires exact paths. Susceptible to race conditions | | ||
|
|
||
| SSA addresses these: | ||
|
|
||
| - **Field ownership**: the server records "this controller owns this field" | ||
| - **Conflict detection**: touching another owner's field produces a `409 Conflict` | ||
| - **Declarative**: you declare which fields should have which values; everything else is left untouched | ||
|
|
||
| ## Basic Pattern | ||
|
|
||
| ```rust | ||
| use kube::api::{Patch, PatchParams}; | ||
|
|
||
| let patch = Patch::Apply(serde_json::json!({ | ||
| "apiVersion": "v1", | ||
| "kind": "ConfigMap", | ||
| "metadata": { "name": "my-cm" }, | ||
| "data": { "key": "value" } | ||
| })); | ||
| let pp = PatchParams::apply("my-controller"); // field manager name | ||
| api.patch("my-cm", &pp, &patch).await?; | ||
| ``` | ||
|
|
||
| The `"my-controller"` string in `PatchParams::apply` is the **field manager** name. Ownership is tracked under this name. Applying again with the same field manager updates owned fields; fields owned by other managers are left alone. | ||
|
|
||
| ## Common Pitfalls | ||
|
|
||
| ### Missing apiVersion and kind | ||
|
|
||
| ```rust | ||
| // ✗ 400 Bad Request | ||
| let patch = Patch::Apply(serde_json::json!({ | ||
| "data": { "key": "value" } | ||
| })); | ||
|
|
||
| // ✓ apiVersion and kind are required | ||
| let patch = Patch::Apply(serde_json::json!({ | ||
| "apiVersion": "v1", | ||
| "kind": "ConfigMap", | ||
| "metadata": { "name": "my-cm" }, | ||
| "data": { "key": "value" } | ||
| })); | ||
| ``` | ||
|
|
||
| Unlike merge patch, SSA requires `apiVersion` and `kind` in every request. | ||
|
|
||
| ### Missing field manager | ||
|
|
||
| ```rust | ||
| // ✗ field_manager is None → API server rejects the request | ||
| let pp = PatchParams::default(); | ||
|
|
||
| // ✓ Explicit field manager | ||
| let pp = PatchParams::apply("my-controller"); | ||
| ``` | ||
|
Comment on lines
+67
to
+73
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. field managers are required for serverside apply so using PatchParams::default with apply should probably be validated as an error in PatchParams rather than documented here as an eternal footgun.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed — this should be a client-side validation rather than a doc-only warning. |
||
|
|
||
| A field manager is **required** for SSA. When `field_manager` is `None` (the default), the API server returns an error. Always use `PatchParams::apply("my-controller")` for SSA operations. | ||
|
|
||
| ### Overusing force | ||
|
|
||
| ```rust | ||
| // Caution: forcibly takes ownership of fields from other managers | ||
| let pp = PatchParams::apply("my-controller").force(); | ||
| ``` | ||
|
|
||
| `force: true` takes ownership of fields from other controllers. Only use this in single-owner situations such as CRD registration. | ||
|
|
||
| ### Including unnecessary fields | ||
|
|
||
| Serializing an entire Rust struct includes `Default` value fields. SSA takes ownership of those fields, causing conflicts when another controller tries to modify them. | ||
|
|
||
| ```rust | ||
| // ✗ Serializes all Default fields → unnecessary ownership | ||
| let full_deployment = Deployment { ..Default::default() }; | ||
|
|
||
| // ✓ Only include fields you actually manage | ||
| let patch = serde_json::json!({ | ||
| "apiVersion": "apps/v1", | ||
| "kind": "Deployment", | ||
| "metadata": { "name": "my-deploy" }, | ||
| "spec": { | ||
| "replicas": 3 | ||
| } | ||
| }); | ||
| ``` | ||
|
|
||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. part of this problem tangled up in kubernetes objects not being fully optional e.g. some replicas ints. i believe this is what drove them to later make ApplyConfigurations variants where everything is truly optional. rust unfortunately does not have an equivalent of applyconfigurations kube-rs/kube#649 so serializing parts of structs can on some occasions be annoying if you want to use a typed interface for partial SSA. possibly this is worth a current limitations callout somewhere 🤔 |
||
| !!! note "Current limitation: no ApplyConfigurations in Rust" | ||
|
|
||
| Go's client-go provides [ApplyConfigurations](https://pkg.go.dev/k8s.io/client-go/applyconfigurations) - fully optional builder types designed specifically for SSA. Rust does not have an equivalent yet ([kube#649](https://github.com/kube-rs/kube/issues/649)). Some [k8s-openapi] fields are not fully optional (e.g. certain integer fields like `maxReplicas`), which can make typed partial SSA awkward. Using `serde_json::json!()` for partial patches works around this issue. | ||
|
|
||
| ## Status Patching | ||
|
|
||
| Status is modified through the `/status` subresource: | ||
|
|
||
| ```rust | ||
| let status_patch = serde_json::json!({ | ||
| "apiVersion": "example.com/v1", | ||
| "kind": "MyResource", | ||
| "status": { | ||
| "phase": "Ready", | ||
| "conditions": [{ | ||
| "type": "Available", | ||
| "status": "True", | ||
| "lastTransitionTime": "2024-01-01T00:00:00Z", | ||
| }] | ||
| } | ||
| }); | ||
| let pp = PatchParams::apply("my-controller"); | ||
| api.patch_status("name", &pp, &Patch::Apply(status_patch)).await?; | ||
| ``` | ||
|
|
||
| !!! warning "Wrap status in the full object structure" | ||
|
|
||
| ```rust | ||
| // ✗ Sending just the status fields will fail | ||
| serde_json::json!({ "phase": "Ready" }) | ||
|
|
||
| // ✓ Must include apiVersion, kind, and wrap under "status" | ||
| serde_json::json!({ | ||
| "apiVersion": "example.com/v1", | ||
| "kind": "MyResource", | ||
| "status": { "phase": "Ready" } | ||
| }) | ||
| ``` | ||
|
|
||
| The Kubernetes API expects the full object structure even on the `/status` endpoint. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah. good callout. this is definitely still a footgun. i wish this could be typed better using default None and builders; let updated = instance.status(MyCrdStatus::default().phase("Ready".into());
client.patch_status<MyCrd>(updated, ssa).await?and have the patch_status strip the spec for users. but it relies on builders (e.g. [k8s-pb](kube-rs/k8s-pb#9, and fulll optionality like applyconfigurations). |
||
|
|
||
| ## Typed SSA | ||
|
|
||
| Instead of `serde_json::json!()`, you can use Rust types for type safety and IDE autocompletion: | ||
|
|
||
| ```rust | ||
| let cm = ConfigMap { | ||
| metadata: ObjectMeta { | ||
| name: Some("my-cm".into()), | ||
| ..Default::default() | ||
| }, | ||
| data: Some(BTreeMap::from([("key".into(), "value".into())])), | ||
| ..Default::default() | ||
| }; | ||
| let pp = PatchParams::apply("my-controller"); | ||
| api.patch("my-cm", &pp, &Patch::Apply(cm)).await?; | ||
| ``` | ||
|
|
||
| [k8s-openapi] types already have `#[serde(skip_serializing_if = "Option::is_none")]` applied, so `None` fields are omitted from serialization. For your own types, you need to add this explicitly: | ||
|
|
||
| ```rust | ||
| #[derive(Serialize)] | ||
| struct MyStatus { | ||
| phase: String, | ||
| #[serde(skip_serializing_if = "Option::is_none")] | ||
| message: Option<String>, | ||
| } | ||
| ``` | ||
|
|
||
| Without `skip_serializing_if`, `None` fields serialize as `null` and SSA takes ownership of them. | ||
|
clux marked this conversation as resolved.
|
||
|
|
||
| --8<-- "includes/abbreviations.md" | ||
| --8<-- "includes/links.md" | ||
|
|
||
| [//begin]: # "Autogenerated link references for markdown compatibility" | ||
| [reconciler]: reconciler "The Reconciler" | ||
| [//end]: # "Autogenerated link references" | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, this is a great callout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually looking at this some more, there's no such pattern described in the reconciler documentation.