Skip to content

Commit 9c8d9d1

Browse files
devantlerclaude
andauthored
feat(operator): self-register and manage the host cluster (#5237)
* feat(operator): self-register and manage the host cluster The operator now registers the cluster it runs ON as a Cluster resource named "host" (labelled ksail.io/host-cluster) in its own namespace, so the hub itself appears in the cluster list and its workloads can be browsed, scaled, restarted, and reconciled through the dashboard — following the pattern of Rancher's "local" cluster, Argo CD's "in-cluster" destination, and Headlamp's in-cluster "main" context. - A leader-gated startup runnable ensures the registration exists (idempotent; a same-named unlabelled cluster is never adopted). - The reconciler skips provisioning/drift/components and the teardown finalizer for host-labelled clusters: it only observes status (endpoint, node readiness) through the operator's own credentials and reports Ready; ComponentsReady is Unknown (reason HostCluster). Deleting a host-labelled resource never invokes a provisioner. - The resource browser resolves the host cluster to an in-cluster dynamic client instead of a vcluster kubeconfig Secret. - The REST API rejects create/update/delete of the host registration with 403 (kubectl remains the escape hatch; CR deletion only deregisters). - The chart gains hostCluster.enabled (default true), the POD_NAMESPACE downward-API env, and host-browse RBAC (nodes/events read, metrics API, GitOps CRs read+patch). - The UI badges the host cluster and hides edit/delete for it; the Overview no longer backfills create-form defaults into its empty spec. Verified end-to-end against a throwaway Kind cluster: self-registration, Ready status with live node counts, node browsing, and 403 lifecycle guards. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * chore: Apply megalinter fixes --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com> Co-authored-by: devantler <26203420+devantler@users.noreply.github.com>
1 parent f73f12d commit 9c8d9d1

22 files changed

Lines changed: 821 additions & 38 deletions

File tree

charts/ksail-operator/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ The operator reconciles `Cluster` resources (`ksail.io/v1alpha1`) so you can pro
1212
- **REST API** — served by the operator and consumed by the UI (toggle with `api.bindPort`).
1313
- **Web UI** _(optional)_ — a dashboard that talks to the REST API (`ui.enabled`).
1414
- **OIDC auth** _(optional)_ — app-driven OIDC login that protects the REST API and UI (`auth.oidc.enabled`).
15+
- **Host cluster registration** — the operator self-registers the cluster it runs on as a `Cluster` resource named `host` (labelled `ksail.io/host-cluster`) in the release namespace, so the hub itself appears in the cluster list and its workloads can be browsed in the UI — like Rancher's `local` cluster or Argo CD's `in-cluster` destination. The operator never provisions, updates, or deletes the underlying cluster for this entry, and the API rejects lifecycle mutations on it. Disable with `hostCluster.enabled=false`.
1516

1617
> **Note:** The REST API is unauthenticated by default. Enable OIDC (`auth.oidc.enabled=true`) to require sign-in, or set `api.bindPort=0` to disable the API entirely when you don't need the UI.
1718
@@ -135,6 +136,12 @@ Register the redirect URL with your provider, and point `ksail.local` at your In
135136
|----------------|--------------------------------------------------------------------------------------------|---------|
136137
| `api.bindPort` | Port the operator REST API listens on (consumed by the UI). Set to `0` to disable the API. | `8080` |
137138

139+
### Host cluster
140+
141+
| Key | Description | Default |
142+
|-----------------------|------------------------------------------------------------------------------------------------------------------------|---------|
143+
| `hostCluster.enabled` | Self-register the cluster the operator runs on as a `Cluster` resource named `host` so it appears in the cluster list. | `true` |
144+
138145
### Web UI
139146

140147
| Key | Description | Default |

charts/ksail-operator/templates/deployment.yaml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,20 @@ spec:
3939
{{- if .Values.ui.readOnly }}
4040
- --read-only
4141
{{- end }}
42+
- --host-cluster={{ .Values.hostCluster.enabled }}
4243
{{- if .Values.auth.oidc.enabled }}
4344
- --oidc-issuer-url={{ .Values.auth.oidc.issuerURL }}
4445
- --oidc-client-id={{ .Values.auth.oidc.clientID }}
4546
- --oidc-redirect-url={{ include "ksail-operator.oidc.redirectURL" . }}
4647
- --oidc-scopes={{ .Values.auth.oidc.scopes }}
4748
{{- end }}
48-
{{- if .Values.auth.oidc.enabled }}
4949
env:
50+
# POD_NAMESPACE tells the operator which namespace to register the host cluster in.
51+
- name: POD_NAMESPACE
52+
valueFrom:
53+
fieldRef:
54+
fieldPath: metadata.namespace
55+
{{- if .Values.auth.oidc.enabled }}
5056
- name: KSAIL_OPERATOR_OIDC_CLIENT_SECRET
5157
valueFrom:
5258
secretKeyRef:
@@ -57,7 +63,7 @@ spec:
5763
secretKeyRef:
5864
name: {{ include "ksail-operator.oidc.secretName" . }}
5965
key: session-secret
60-
{{- end }}
66+
{{- end }}
6167
ports:
6268
{{- if .Values.api.bindPort }}
6369
- name: api

charts/ksail-operator/templates/rbac.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,30 @@ rules:
6868
resources: ["roles", "rolebindings", "clusterroles", "clusterrolebindings"]
6969
verbs:
7070
["get", "list", "watch", "create", "update", "patch", "delete", "deletecollection", "bind", "escalate"]
71+
{{- if .Values.hostCluster.enabled }}
72+
# Host cluster browsing (hostCluster.enabled): the dashboard reads the hub itself through the
73+
# operator's ServiceAccount. The workload kinds above already carry read/write verbs; these add
74+
# the read-only kinds the resource browser needs (nodes, events), the metrics API powering the
75+
# Overview's usage gauges, and the GitOps CRs (patched to trigger reconciles).
76+
- apiGroups: [""]
77+
resources: ["nodes", "events"]
78+
verbs: ["get", "list", "watch"]
79+
- apiGroups: ["metrics.k8s.io"]
80+
resources: ["nodes", "pods"]
81+
verbs: ["get", "list"]
82+
- apiGroups:
83+
- kustomize.toolkit.fluxcd.io
84+
- helm.toolkit.fluxcd.io
85+
- source.toolkit.fluxcd.io
86+
- argoproj.io
87+
resources:
88+
- kustomizations
89+
- helmreleases
90+
- gitrepositories
91+
- ocirepositories
92+
- applications
93+
verbs: ["get", "list", "watch", "patch"]
94+
{{- end }}
7195
{{- end }}
7296
---
7397
apiVersion: rbac.authorization.k8s.io/v1

charts/ksail-operator/values.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,15 @@ api:
3232
# Port the operator REST API listens on (consumed by the UI). Set to 0 to disable the API.
3333
bindPort: 8080
3434

35+
# hostCluster self-registers the cluster the operator runs on as a Cluster resource (named "host",
36+
# labelled ksail.io/host-cluster, in the release namespace), so the hub itself appears in the
37+
# cluster list and its workloads can be browsed through the operator's ServiceAccount — like
38+
# Rancher's "local" cluster or Argo CD's "in-cluster" destination. The operator never provisions,
39+
# updates, or deletes the underlying cluster for this entry, and the API rejects lifecycle
40+
# mutations on it (kubectl-deleting the resource merely deregisters it until the next restart).
41+
hostCluster:
42+
enabled: true
43+
3544
# ui is the web dashboard. It is embedded in the operator binary and served by the operator itself
3645
# on the API port (api.bindPort) — same origin as the REST API, so there is no separate UI
3746
# container and no reverse proxy. Reach it via the ingress below or by port-forwarding the operator

docs/src/content/docs/cli-flags/operator/operator-root.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Flags:
1919
--api-bind-address string Address the REST API binds to (empty disables it, e.g. ":8080")
2020
--dev-logging Emit human-readable console logs instead of structured JSON (for local development)
2121
--health-probe-bind-address string Address the health and readiness probes bind to (default ":8081")
22+
--host-cluster Register the cluster the operator runs on as a Cluster resource (named "host") so it appears in the cluster list (default true)
2223
--leader-elect Enable leader election to ensure only one active operator instance
2324
--metrics-bind-address string Address the metrics endpoint binds to ("0" disables it) (default "0")
2425
--oidc-client-id string OIDC client ID (the client secret is read from KSAIL_OPERATOR_OIDC_CLIENT_SECRET)

internal/controller/cluster_controller.go

Lines changed: 73 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,11 @@ type ClusterReconciler struct {
135135
// disables runtime status reporting (endpoint/nodes stay empty).
136136
ObserveStatus StatusObserver
137137

138+
// ObserveHostStatus gathers runtime status for the self-registered host cluster (the cluster the
139+
// operator runs on) through the operator's own credentials. Optional; nil disables runtime status
140+
// reporting for the host cluster.
141+
ObserveHostStatus StatusObserver
142+
138143
// InstallComponents installs the cluster's components into the provisioned child cluster.
139144
// Optional; nil disables component installation.
140145
InstallComponents ComponentInstaller
@@ -173,6 +178,14 @@ func (r *ClusterReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ct
173178
return r.reconcileDelete(ctx, &cluster)
174179
}
175180

181+
// The host cluster (the operator's self-registration of the cluster it runs on) is never
182+
// provisioned or torn down, so it gets no finalizer and a status-only reconcile.
183+
if cluster.IsHostCluster() {
184+
log.Info("reconciling host cluster")
185+
186+
return r.reconcileHost(ctx, &cluster)
187+
}
188+
176189
if controllerutil.AddFinalizer(&cluster, FinalizerName) {
177190
updateErr := r.Update(ctx, &cluster)
178191
if updateErr != nil {
@@ -264,6 +277,37 @@ func (r *ClusterReconciler) reconcileNormal(
264277
return ctrl.Result{RequeueAfter: r.readyRequeue()}, nil
265278
}
266279

280+
// reconcileHost reconciles the self-registered host cluster. The underlying cluster — the one the
281+
// operator runs on — already exists and is owned by whoever provisioned it, so there is nothing to
282+
// create, update, or delete: reconciliation only observes runtime status (node readiness, endpoint)
283+
// through the operator's own credentials and reports it. Component installation is intentionally
284+
// skipped; the host cluster's components are not the operator's to manage.
285+
func (r *ClusterReconciler) reconcileHost(
286+
ctx context.Context,
287+
cluster *v1alpha1.Cluster,
288+
) (ctrl.Result, error) {
289+
before := cluster.Status.DeepCopy()
290+
291+
r.observeStatusWith(ctx, r.ObserveHostStatus, cluster)
292+
293+
apimeta.SetStatusCondition(&cluster.Status.Conditions, metav1.Condition{
294+
Type: v1alpha1.ConditionComponentsReady,
295+
Status: metav1.ConditionUnknown,
296+
ObservedGeneration: cluster.Generation,
297+
Reason: "HostCluster",
298+
Message: "components on the host cluster are not managed by the operator",
299+
})
300+
301+
r.markReady(cluster)
302+
303+
statusErr := r.updateStatusIfChanged(ctx, cluster, before)
304+
if statusErr != nil {
305+
return ctrl.Result{}, statusErr
306+
}
307+
308+
return ctrl.Result{RequeueAfter: r.readyRequeue()}, nil
309+
}
310+
267311
// reconcileComponents installs the cluster's components when they are not already reconciled for the
268312
// current generation, recording the outcome in the ComponentsReady condition. Best-effort: failures
269313
// are reported via the condition (not the reconcile error) and return false so the reconcile
@@ -335,14 +379,25 @@ func componentsUpToDate(cluster *v1alpha1.Cluster) bool {
335379
// optional StatusObserver. It is best-effort: observation errors are logged and partial results
336380
// applied, since a not-yet-reachable child cluster is expected shortly after provisioning.
337381
func (r *ClusterReconciler) observeStatus(ctx context.Context, cluster *v1alpha1.Cluster) {
338-
if r.ObserveStatus == nil {
382+
r.observeStatusWith(ctx, r.ObserveStatus, cluster)
383+
}
384+
385+
// observeStatusWith applies the given observer's results to the cluster status. Shared by the
386+
// child-cluster path (ObserveStatus) and the host-cluster path (ObserveHostStatus); a nil observer
387+
// disables observation.
388+
func (r *ClusterReconciler) observeStatusWith(
389+
ctx context.Context,
390+
observer StatusObserver,
391+
cluster *v1alpha1.Cluster,
392+
) {
393+
if observer == nil {
339394
return
340395
}
341396

342-
observed, err := r.ObserveStatus(ctx, r.reader(), cluster)
397+
observed, err := observer(ctx, r.reader(), cluster)
343398
if err != nil {
344399
logf.FromContext(ctx).
345-
Info("observe child cluster status (best-effort)", "error", err.Error())
400+
Info("observe cluster status (best-effort)", "error", err.Error())
346401
}
347402

348403
if observed.Endpoint != "" {
@@ -368,6 +423,21 @@ func (r *ClusterReconciler) reconcileDelete(
368423
return ctrl.Result{}, nil
369424
}
370425

426+
// Deleting the host registration must never destroy anything: the underlying cluster is the one
427+
// the operator runs on. The host path adds no finalizer, but a user may have labelled a cluster
428+
// that already carried one — remove it without invoking the provisioner (conservative: the
429+
// underlying cluster is orphaned, not destroyed).
430+
if cluster.IsHostCluster() {
431+
controllerutil.RemoveFinalizer(cluster, FinalizerName)
432+
433+
err := r.Update(ctx, cluster)
434+
if err != nil {
435+
return ctrl.Result{}, fmt.Errorf("remove finalizer: %w", err)
436+
}
437+
438+
return ctrl.Result{}, nil
439+
}
440+
371441
r.markProgressing(cluster, v1alpha1.ClusterPhaseDeleting, "Deleting", "Deleting cluster")
372442
// Status update is best-effort during deletion; ignore conflicts on a terminating object.
373443
_ = r.updateStatus(ctx, cluster)
Lines changed: 143 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,143 @@
1+
package controller_test
2+
3+
import (
4+
"context"
5+
"testing"
6+
7+
"github.com/devantler-tech/ksail/v7/internal/controller"
8+
"github.com/devantler-tech/ksail/v7/pkg/apis/cluster/v1alpha1"
9+
clusterprovisioner "github.com/devantler-tech/ksail/v7/pkg/svc/provisioner/cluster"
10+
"github.com/stretchr/testify/assert"
11+
"github.com/stretchr/testify/require"
12+
apierrors "k8s.io/apimachinery/pkg/api/errors"
13+
apimeta "k8s.io/apimachinery/pkg/api/meta"
14+
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
15+
"k8s.io/apimachinery/pkg/types"
16+
ctrl "sigs.k8s.io/controller-runtime"
17+
"sigs.k8s.io/controller-runtime/pkg/client"
18+
)
19+
20+
const (
21+
hostName = "host"
22+
hostNamespace = "default"
23+
)
24+
25+
// newHostCluster returns a host-labelled Cluster, mirroring the operator's self-registration of the
26+
// cluster it runs on (empty spec, host-cluster label).
27+
func newHostCluster(withFinalizer bool) *v1alpha1.Cluster {
28+
cluster := &v1alpha1.Cluster{
29+
ObjectMeta: metav1.ObjectMeta{
30+
Name: hostName,
31+
Namespace: hostNamespace,
32+
Generation: 1,
33+
Labels: map[string]string{v1alpha1.HostClusterLabel: "true"},
34+
},
35+
}
36+
if withFinalizer {
37+
cluster.Finalizers = []string{controller.FinalizerName}
38+
}
39+
40+
return cluster
41+
}
42+
43+
func hostRequest() ctrl.Request {
44+
return ctrl.Request{
45+
NamespacedName: types.NamespacedName{Name: hostName, Namespace: hostNamespace},
46+
}
47+
}
48+
49+
// newHostReconciler builds a reconciler whose provisioner builder fails the reconcile if invoked —
50+
// the host path must never touch a provisioner — and whose host observer reports fixed status.
51+
func newHostReconciler(
52+
t *testing.T,
53+
fakeClient client.Client,
54+
) *controller.ClusterReconciler {
55+
t.Helper()
56+
57+
return &controller.ClusterReconciler{
58+
Client: fakeClient,
59+
Scheme: newScheme(t),
60+
NewProvisioner: func(
61+
_ context.Context,
62+
_ *v1alpha1.Cluster,
63+
) (clusterprovisioner.Provisioner, error) {
64+
return nil, errBoom
65+
},
66+
ObserveHostStatus: func(
67+
_ context.Context,
68+
_ client.Reader,
69+
_ *v1alpha1.Cluster,
70+
) (controller.ObservedStatus, error) {
71+
return controller.ObservedStatus{
72+
Endpoint: "https://10.96.0.1:443",
73+
NodesReady: 2,
74+
NodesTotal: 3,
75+
NodesObserved: true,
76+
}, nil
77+
},
78+
}
79+
}
80+
81+
func TestReconcile_HostClusterReportsReadyWithoutProvisioner(t *testing.T) {
82+
t.Parallel()
83+
84+
fakeClient := newFakeClient(newScheme(t), newHostCluster(false))
85+
reconciler := newHostReconciler(t, fakeClient)
86+
87+
res, err := reconciler.Reconcile(context.Background(), hostRequest())
88+
require.NoError(t, err, "the host path must not build a provisioner")
89+
assert.Positive(t, res.RequeueAfter, "the host cluster should be re-observed periodically")
90+
91+
var got v1alpha1.Cluster
92+
93+
require.NoError(t, fakeClient.Get(context.Background(), hostRequest().NamespacedName, &got))
94+
assert.Equal(t, v1alpha1.ClusterPhaseReady, got.Status.Phase)
95+
assert.Empty(t, got.Finalizers, "the host cluster must not get the teardown finalizer")
96+
assert.Equal(t, "https://10.96.0.1:443", got.Status.Endpoint)
97+
assert.Equal(t, int32(2), got.Status.NodesReady)
98+
assert.Equal(t, int32(3), got.Status.NodesTotal)
99+
100+
ready := apimeta.FindStatusCondition(got.Status.Conditions, v1alpha1.ConditionReady)
101+
require.NotNil(t, ready)
102+
assert.Equal(t, metav1.ConditionTrue, ready.Status)
103+
104+
components := apimeta.FindStatusCondition(
105+
got.Status.Conditions,
106+
v1alpha1.ConditionComponentsReady,
107+
)
108+
require.NotNil(t, components)
109+
assert.Equal(t, metav1.ConditionUnknown, components.Status)
110+
assert.Equal(t, "HostCluster", components.Reason)
111+
}
112+
113+
func TestReconcile_HostClusterDeleteSkipsProvisioner(t *testing.T) {
114+
t.Parallel()
115+
116+
scheme := newScheme(t)
117+
// A user may have labelled an existing cluster that already carried the finalizer; deletion must
118+
// remove the finalizer without ever invoking the provisioner.
119+
cluster := newHostCluster(true)
120+
fakeClient := newFakeClient(scheme, cluster)
121+
prov := &fakeProvisioner{exists: true}
122+
reconciler := newReconciler(scheme, fakeClient, prov)
123+
124+
require.NoError(t, fakeClient.Delete(context.Background(), cluster))
125+
126+
_, err := reconciler.Reconcile(context.Background(), hostRequest())
127+
require.NoError(t, err)
128+
assert.Equal(
129+
t,
130+
0,
131+
prov.deleteCalls,
132+
"deleting the host registration must not tear anything down",
133+
)
134+
135+
var got v1alpha1.Cluster
136+
137+
getErr := fakeClient.Get(context.Background(), hostRequest().NamespacedName, &got)
138+
assert.True(
139+
t,
140+
apierrors.IsNotFound(getErr),
141+
"the registration should be gone after finalizer removal",
142+
)
143+
}

pkg/apis/cluster/v1alpha1/labels.go

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,17 @@ package v1alpha1
44
// Cluster resource. The operator only ever deletes namespaces carrying this label, so namespaces
55
// that already existed (e.g. "default" or user-managed namespaces) are never removed.
66
const ManagedNamespaceLabel = "ksail.io/managed-namespace"
7+
8+
// HostClusterLabel marks the Cluster resource the operator self-registers to represent the cluster
9+
// it runs ON (the hub), following the pattern of Rancher's "local" cluster and Argo CD's
10+
// "in-cluster" destination. The label is reserved: the operator never provisions, updates, or
11+
// deletes the underlying cluster for a resource carrying it — it only observes status and serves
12+
// resource browsing through its own in-cluster credentials — and the REST API rejects lifecycle
13+
// mutations on it.
14+
const HostClusterLabel = "ksail.io/host-cluster"
15+
16+
// IsHostCluster reports whether this Cluster resource is the operator's self-registration of the
17+
// cluster it runs on (see HostClusterLabel).
18+
func (c *Cluster) IsHostCluster() bool {
19+
return c.Labels[HostClusterLabel] == "true"
20+
}

0 commit comments

Comments
 (0)