You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The janitor-provider module executes node lifecycle operations requested by Janitor, including reboot signals, node readiness checks, and provider-specific termination signals. This document covers the Helm configuration options for selecting and configuring the provider backend.
6
+
7
+
## Configuration Reference
8
+
9
+
### Module Enable/Disable
10
+
11
+
Controls whether the janitor-provider module is deployed in the cluster.
12
+
13
+
```yaml
14
+
global:
15
+
janitorProvider:
16
+
enabled: true
17
+
```
18
+
19
+
### CSP Provider
20
+
21
+
Selects the provider implementation used by janitor-provider.
22
+
23
+
```yaml
24
+
janitor-provider:
25
+
csp:
26
+
provider: "kind"
27
+
```
28
+
29
+
Supported providers are `kind`, `kwok`, `aws`, `gcp`, `azure`, `oci`, `nebius`, and `generic`.
30
+
31
+
## Generic Provider
32
+
33
+
The `generic` provider is intended for bare-metal and on-premises clusters where there is no cloud provider reboot API. It creates a privileged Kubernetes Job on the target node and uses the node `bootID` to verify that a reboot occurred.
`rebootImage`sets the container image used for the privileged reboot Job.
50
+
51
+
`useSysrqReboot`switches the reboot command from `chroot /host reboot` to Linux Magic SysRq by writing `b` to the host `/proc/sysrq-trigger`. Keep this disabled unless the standard reboot path leaves nodes stuck `NotReady`.
52
+
53
+
`rebootJobNamespace`sets the namespace where reboot Jobs are created. If empty, the janitor-provider release namespace is used.
54
+
55
+
`rebootJobTTLSeconds`controls how long completed reboot Jobs are retained before Kubernetes garbage-collects them.
56
+
57
+
`imagePullSecrets`is a comma-separated list of image pull secret names used by the reboot Job.
58
+
59
+
### SysRq Reboot
60
+
61
+
When `useSysrqReboot` is enabled, janitor-provider sets `GENERIC_REBOOT_USE_SYSRQ=true` and the reboot Job writes to the host SysRq trigger instead of invoking the normal reboot command.
62
+
63
+
This path bypasses the normal userspace shutdown flow. It is useful for environments where `chroot /host reboot` or `sudo reboot` is accepted but leaves the node stuck `NotReady`, but it should remain an explicit opt-in because it is more abrupt than the default reboot path.
Copy file name to clipboardExpand all lines: docs/designs/028-generic-baremetal-reboot-provider.md
+12-2Lines changed: 12 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ flowchart TD
33
33
34
34
## Decision
35
35
36
-
Add a **generic provider** (`CSP=generic`) that reboots nodes via a privileged Kubernetes Job running `chroot /host /sbin/reboot`, following the Job-based pattern from GPU Reset ([ADR-019](019-janitor-gpu-reset.md)). It is a named provider in the factory switch, just like `aws`, `gcp`, or `kind`.
36
+
Add a **generic provider** (`CSP=generic`) that reboots nodes via a privileged Kubernetes Job. By default the Job runs `chroot /host /sbin/reboot`; deployments can opt into a Linux Magic SysRq reboot for environments where the normal reboot path wedges the node. This follows the Job-based pattern from GPU Reset ([ADR-019](019-janitor-gpu-reset.md)). It is a named provider in the factory switch, just like `aws`, `gcp`, or `kind`.
37
37
38
38
## Implementation
39
39
@@ -69,7 +69,7 @@ sequenceDiagram
69
69
GP-->>JC: requestID = pre-reboot bootID
70
70
71
71
K8s->>Job: Schedule pod on target node
72
-
Job->>Node: chroot /host /sbin/reboot
72
+
Job->>Node: chroot /host /sbin/reboot or echo b > /proc/sysrq-trigger
73
73
Note over Node: Node reboots, pod is killed
74
74
75
75
Note over Node: Node boots back up, new bootID assigned
@@ -87,6 +87,13 @@ sequenceDiagram
87
87
88
88
Records the node's current `bootID` from `node.Status.NodeInfo.BootID`, creates a privileged Job on the target node, and returns the pre-reboot `bootID` as the `requestID`.
89
89
90
+
The reboot Job supports two reboot paths:
91
+
92
+
- Default: `chroot /host reboot`
93
+
- SysRq opt-in: `echo b > /proc/sysrq-trigger` via the host `/proc` mount
94
+
95
+
The SysRq path is intended for bare-metal environments where the standard reboot command is accepted but leaves the node stuck `NotReady`. It bypasses the normal userspace shutdown path, so it is controlled by an explicit feature flag.
96
+
90
97
**Job specification:**
91
98
92
99
```yaml
@@ -166,6 +173,7 @@ csp:
166
173
167
174
generic: # config for the generic provider (when provider=generic)
168
175
rebootImage: "busybox:1.37"
176
+
useSysrqReboot: false # true to use echo b > /proc/sysrq-trigger
169
177
rebootJobNamespace: "" # defaults to the janitor-provider's own namespace
0 commit comments