Skip to content

Commit c98aa8e

Browse files
committed
Addressed review comments
Signed-off-by: Vishal Anarase <iamvishalanarase@gmail.com>
1 parent 001e75c commit c98aa8e

1 file changed

Lines changed: 48 additions & 41 deletions

File tree

design/ironic-standalone-operator/reconciliation-events.md

Lines changed: 48 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
## Summary
44

55
Ironic Standalone Operator should emit Kubernetes Events on the `Ironic`
6-
custom resource to make reconciliation progress and failures visible to
6+
custom resource to make meaningful state transitions and actionable failures visible to
77
cluster users (`kubectl describe ironic ...`), without flooding the cluster
88
with repetitive messages.
99

@@ -12,16 +12,20 @@ This document proposes a small, stable set of **Event reasons**, their **types**
1212

1313
## Goals
1414

15-
- Provide actionable feedback during reconcile (start, progress, ready).
16-
- Surface user-actionable failures (e.g. referenced Secret/ConfigMap missing).
17-
- Surface operator/actionable failures (transient reconcile failures).
18-
- Avoid event spam: emit primarily on transitions or user-actionable changes.
15+
- Surface **actionable** failures (e.g. invalid version, missing
16+
referenced Secret/ConfigMap).
17+
- Surface **meaningful state transitions** (e.g. ready, version change,
18+
operator-created credentials).
19+
- Avoid event spam: **no** per-reconcile "started/finished" events and **no**
20+
transient operational errors (those belong in logs and
21+
`status.conditions`).
1922
- Keep the contract stable so downstream docs/automation can rely on reasons.
2023

2124
## Non-goals
2225

2326
- Emitting events on every child object create/update (too noisy).
24-
- Emitting events for every internal step in `pkg/ironic` (implementation detail).s
27+
- Emitting events for every internal step in `pkg/ironic` (implementation
28+
detail).
2529

2630
## Background
2731

@@ -33,60 +37,59 @@ because it is the primary user-facing object.
3337

3438
### General rules (anti-spam)
3539

36-
- Emit "start" and "fully reconciled" events at most once per reconcile loop.
37-
- Emit transition events (Ready/NotReady) only when the Ready condition changes.
38-
- Emit "in progress" events only when the status/conditions meaningfully change
39-
(e.g. `status.String()` changes), and ideally rate-limit per Generation.
40-
- Prefer Warning events only for user-actionable errors or hard failures.
40+
- Do **not** emit an event at the start or successful end of each reconcile
41+
loop (`Reconciling` / `Reconciled` are intentionally omitted).
42+
- Do **not** emit Warning events for **transient** errors (e.g. API conflicts,
43+
temporary network issues, ensure failures that will be retried); use logs
44+
and conditions instead.
45+
- Emit Normal events for **durable, user-visible changes** (e.g. version
46+
change request, new API credentials Secret created).
47+
- Emit events on **Ready condition transitions** (`False → True` / `True →
48+
False`) with messages aligned to `status.conditions`, not on every
49+
intermediate "in progress" status string.
50+
- Do **not** emit separate "in progress" or “error without Ready flip” events;
51+
represent not-ready and error states through **one** path: the Ready
52+
condition and its message (see `IronicReady` / `IronicNotReady` below).
53+
- Prefer Warning events only for **actionable** or **configuration**
54+
problems.
4155
- Use stable `reason` strings; avoid embedding variable data in `reason`.
4256
- Include variable detail in the message (e.g. namespaced name, versions).
4357

4458
### Event reasons
4559

4660
| Reason | Type | When | Message (template) |
4761
|---|---|---|---|
48-
| `Reconciling` | Normal | Beginning of reconciliation for an `Ironic` object | `Starting reconciliation` |
49-
| `ReconcileFailed` | Warning | Reconcile returned error (will be retried) | `Reconciliation failed: <error>` |
50-
| `Reconciled` | Normal | Reconcile completed with no requeue and no error | `Object has been fully reconciled` |
5162
| `VersionError` | Warning | Invalid/unsupported version configuration (user action required) | `Failed to process version: <error>` |
5263
| `DowngradeRejected` | Warning | Downgrade blocked due to external DB constraint | `Downgrade from <installed> to <requested> is not supported with external database` |
53-
| `VersionChange` | Normal | Requested version changed (upgrade requested) | `Version change requested from <installed> to <requested>` |
64+
| `VersionChange` | Normal | Requested version changed (upgrade/downgrade requested) | `Version change requested from <installed> to <requested>` |
5465
| `APISecretCreated` | Normal | Operator generated new API credentials Secret | `Created new API credentials secret: <name>` |
55-
| `APISecretError` | Warning | Ensuring API secret failed (transient) | `Failed to ensure API secret: <error>` |
5666
| `SecretNotFound` | Warning | User referenced a Secret that does not exist | `secret <ns>/<name> not found` |
5767
| `ConfigMapNotFound` | Warning | User referenced a ConfigMap that does not exist | `configmap <ns>/<name> not found` |
58-
| `EnsureIronicFailed` | Warning | Creating/updating Ironic child resources failed (transient) | `Failed to ensure Ironic resources: <error>` |
59-
| `IronicReady` | Normal | Ready condition transitions `False → True` | `Ironic deployment is now ready` |
60-
| `IronicNotReady` | Warning | Ready condition transitions `True → False` | `<status>` |
61-
| `IronicError` | Warning | Status indicates error (without necessarily flipping Ready) | `<status>` |
62-
| `IronicInProgress` | Normal | Status indicates progress (not ready, not error) | `<status>` |
68+
| `IronicReady` | Normal | Ready condition transitions to `True` | `Ironic deployment is now ready` |
69+
| `IronicNotReady` | Warning | Ready condition transitions to `False` (includes configuration errors and dependency failures surfaced there) | `<message from Ready condition>` |
6370

6471
Notes:
6572

66-
- `SecretNotFound` and `ConfigMapNotFound` are intended to be user-actionable,
67-
and should be aligned with setting `IronicReasonFailed` in Conditions.
68-
- The `<status>` messages should match what is surfaced via Conditions (e.g.
69-
`resource: <status>`), so users see consistent messaging between Events and
70-
`status.conditions`.
73+
- `SecretNotFound` and `ConfigMapNotFound` are user-actionable and should stay
74+
aligned with setting failure reasons in Conditions.
75+
- Any “error” or “not ready” situation that does not flip Ready in the same
76+
reconcile should still be reflected in Conditions first; when Ready becomes
77+
`False`, use **`IronicNotReady`** with the same message users see in
78+
`status.conditions` (no separate `IronicError` reason).
7179

72-
## Proposed follow-ups / gaps
80+
## Deletion, finalizers, and cleanup
7381

74-
The contract above intentionally stays narrow. After the initial rollout, we
75-
can consider adding **a small number** of additional events if they prove
76-
valuable and non-noisy, for example:
77-
78-
- `FinalizerAdded` / `FinalizerRemoved`: only on transition, to help explain
79-
deletion behavior.
80-
- `CleanupStarted` / `CleanupFailed`: during finalization and resource teardown.
81-
82-
These should be added only if users commonly debug deletion/cleanup issues and
83-
need explicit breadcrumbs.
82+
Debugging deletion, finalizers, and teardown is an **operator** concern.
83+
Visibility for that belongs in **controller logs**, not in user-facing Events
84+
on the `Ironic` CR. No additional Event reasons are proposed for cleanup
85+
flows.
8486

8587
## Implementation plan (high level)
8688

8789
- Ensure `IronicReconciler` has an `EventRecorder` and it is initialized in
8890
`cmd/main.go` (controller-runtime manager recorder).
89-
- Record events using the reasons and conditions above.
91+
- Record events only for the reasons in the table above, following the general
92+
rules.
9093
- Keep emission in the controller layer; do not sprinkle events throughout
9194
lower-level helpers unless necessary.
9295
- Add/verify RBAC permissions for `events` are present in generated RBAC.
@@ -96,9 +99,13 @@ need explicit breadcrumbs.
9699
- Add unit tests for event emission in the controller using a fake recorder:
97100
- Ready transition emits `IronicReady` once.
98101
- Missing Secret/ConfigMap emits `SecretNotFound` / `ConfigMapNotFound`.
99-
- Transient reconcile errors emit `ReconcileFailed`.
100-
- Validate via e2e/functional tests (optional, if already covering these flows):
101-
- Deploy an `Ironic` CR and assert at least one `IronicReady` event appears.
102+
- Invalid version emits `VersionError` (or equivalent user-error path).
103+
- Assert that **transient** reconcile errors do **not** emit Warning Events
104+
(only logs/conditions).
105+
- Validate via e2e/functional tests (optional, if already covering these
106+
flows):
107+
- Deploy an `Ironic` CR and assert a `IronicReady` event appears when Ready
108+
becomes `True`.
102109

103110
## Compatibility and upgrade considerations
104111

0 commit comments

Comments
 (0)