This document describes the internal design of the Clowder Kubernetes operator: its components, data flows, design tradeoffs, and implementation details. It is intended for contributors modifying or extending the operator. Installation, usage, and development workflow belong in other documents.
- Overview
- Custom Resource Definitions
- Controller Architecture
- Provider System
- Resource Cache
- Watch and Filter System
- Application Configuration Generation
- Dependency Endpoint Resolution
- Web Provider System
- Observability
- Code Generation Pipeline
Clowder is a single Kubernetes operator (Go, controller-runtime/Kubebuilder) that manages the common infrastructure concerns shared by cloud.redhat.com applications. Rather than each application team building their own operator, Clowder provides a single opinionated control plane that handles databases, messaging, object storage, in-memory caches, web services, metrics, logging, and inter-application dependencies.
The central design choice is that application teams declare what they need (via ClowdApp CRs)
rather than how to provision it. The operator resolves this against an environment's provider
configuration (ClowdEnvironment) and assembles a complete runtime configuration document
(cdappconfig.json) that is injected into application containers as a mounted Kubernetes Secret.
The operator is structured around three CRDs, three controllers, and a pluggable provider system where each capability area is implemented as an independently registered, ordered provider.
Language: Go 1.25
Module: github.com/RedHatInsights/clowder
All CRD types live in apis/cloud.redhat.com/v1alpha1/.
Cluster-scoped. Represents a single deployment environment (e.g., stage, production, ephemeral).
Each environment selects a mode and configuration for every provider type. No ClowdApp can exist
without a referenced ClowdEnvironment.
Key spec fields (ClowdEnvironmentSpec):
| Field | Purpose |
|---|---|
targetNamespace |
Namespace where environment-level resources (Kafka, Minio, etc.) are created |
providers |
ProvidersConfig — one config block per provider type (db, kafka, web, etc.) |
resourceDefaults |
Default pod resource requests/limits applied to any ClowdApp that omits them |
serviceConfig.type |
ClusterIP or NodePort for generated Services |
disabled |
Pauses reconciliation when true |
Provider mode fields within providers: Each provider sub-spec carries a mode enum that
selects the implementation. See Provider System for the full mode list per
provider.
Status tracks targetNamespace, ready, managed/ready deployment counts, managed/ready topic
counts, the list of apps in the env (apps[]), and a prometheus.serverAddress.
Namespace-scoped. Represents a single application composed of one or more Deployment or Job/CronJob specs plus declarations of the infrastructure it needs.
Key spec fields (ClowdAppSpec):
| Field | Purpose |
|---|---|
envName |
References the owning ClowdEnvironment (cluster-scoped lookup by name) |
deployments[] |
List of Deployment objects; each produces a k8s Deployment + optional Services |
jobs[] |
List of Job objects; each produces a k8s Job or CronJob depending on schedule |
kafkaTopics[] |
Topic declarations; processed by the Kafka provider |
database |
Single DB spec (name, version, sharedDbAppName, t-shirt sizes) |
objectStore[] |
List of bucket names |
inMemoryDb / sharedInMemoryDbAppName |
In-memory DB request and optional sharing |
featureFlags |
Opt-in to feature flag provider config |
dependencies[] |
Required ClowdApp names; their endpoints are included in cdappconfig.json |
optionalDependencies[] |
Same as dependencies but do not block readiness if absent |
cyndi |
Cyndi pipeline configuration for database syndication (Kafka operator mode only) |
disabled |
Pauses reconciliation when true |
Each Deployment within a ClowdApp supports autoscaling via either a simple HPA
(autoScalerSimple) or a full KEDA ScaledObject (autoScaler). The naming pattern for all
generated resources is {app-name}-{deployment-name}.
Status tracks ready, managed/ready deployment counts, and conditions (including
ReconciliationSuccessful and DeploymentsReady).
Namespace-scoped. Triggers ad-hoc execution of one or more Job specs defined in a ClowdApp.
Also used to launch IQE (integration quality engineering) test jobs.
Key spec fields (ClowdJobInvocationSpec):
| Field | Purpose |
|---|---|
appName |
The ClowdApp that owns the jobs to invoke |
jobs[] |
Names of jobs (from ClowdApp.spec.jobs) to run |
testing.iqe |
IQE test job spec (plugin, image tag, UI test config) |
runOnNotReady |
Run jobs even if the app's deployments are not ready |
disabled |
Prevents the CJI from running |
Status tracks completed, and a jobMap of job name → state (Invoked, Complete, Failed).
ClowdEnvironment (cluster-scoped)
└── ClowdApp (namespace-scoped, spec.envName → env name)
└── ClowdJobInvocation (namespace-scoped, spec.appName → app name)
ClowdApp and ClowdJobInvocation resources set owner references pointing to their respective
parents, enabling garbage collection of generated resources.
The three controllers live in controllers/cloud.redhat.com/:
clowdenvironment_controller.go— watchesClowdEnvironmentclowdapp_controller.go— watchesClowdAppclowdjobinvocation_controller.go— watchesClowdJobInvocation
Each controller is registered with the manager in run.go.
Each controller delegates to a dedicated reconciliation struct that executes a sequential list of
steps. For ClowdApp, the steps in ClowdAppReconciliation.steps() are:
- Fetch the
ClowdAppresource. - Update the set of "present and managed" apps.
- Start Prometheus reconciliation duration metrics.
- Handle deletion finalizers.
- Check if the app is locked, disabled, or its namespace is being deleted.
- Fetch the associated
ClowdEnvironment. - Verify the environment's namespace is not being deleted.
- Verify the environment has been reconciled and is ready.
- Create the in-memory resource cache.
- Run all providers in registration order.
- Apply the cache atomically to Kubernetes.
- Update deployment status on the
ClowdApp. - Delete unused (orphaned) resources.
- Set
ReconciliationSuccessfulcondition and emit a success event. - Stop metrics.
ClowdEnvironment reconciliation follows the same step pattern but runs environment-level provider
functions (EnvProvide()) rather than app-level ones. It also creates and manages the
targetNamespace if it does not exist.
When a ClowdApp is reconciled, the controller first calls EnvProvide() on every registered
provider before calling Provide(app). This is because environment-level providers populate data
into the resource cache and the AppConfig struct (e.g., Kafka broker addresses, object store
endpoints) that app-level providers then read. Without re-running EnvProvide() during app
reconciliation, an app provider would see stale or missing infrastructure data.
This means that environment provider logic runs on every app reconciliation. Providers that perform expensive or mutating environment setup must be idempotent and cache-aware to avoid side effects from this double execution.
All providers live under controllers/cloud.redhat.com/providers/. Each provider
package implements the ClowderProvider interface and registers itself using an init() function.
type ClowderProvider interface {
EnvProvide() error // called during environment reconciliation and before Provide()
Provide(app *crd.ClowdApp) error // called for each ClowdApp reconciliation
GetConfig() *config.AppConfig
}Providers register via:
providers.ProvidersRegistration.Register(setupFn, order, name)The registry sorts by order integer. During reconciliation the registry is iterated in sorted
order. Providers with lower numbers run first. The ordering is critical: providers that produce
resources or config data must run before providers that consume that data.
Registered providers and their execution order:
| Order | Package | Name | Purpose |
|---|---|---|---|
| 0 | deployment |
deployment | Base Deployment/PodSpec construction |
| 0 | cronjob |
cronjob | CronJob/Job construction |
| 1 | web |
web | Web services, Services, Caddy config |
| 2 | metrics |
metrics | Metrics Services, ServiceMonitors, Prometheus |
| 4 | dependencies |
dependencies | Inter-app endpoint resolution |
| 5 | database |
db | Database provisioning or credential pass-through |
| 5 | featureflags |
featureflags | Unleash/feature flag provisioning |
| 5 | inmemorydb |
inmemorydb | Redis or Elasticache |
| 5 | logging |
logging | CloudWatch credentials or null logging |
| 5 | objectstore |
objectstore | Minio or S3 credentials |
| 6 | kafka |
kafka | Strimzi, MSK, local, managed Kafka |
| 7 | reverseproxy |
reverseproxy | Frontend reverse proxy (ephemeral mode) |
| 10 | autoscaler |
autoscaler | KEDA ScaledObject or simple HPA |
| 50 | namespace |
namespace | Namespace-level resources |
| 97 | serviceaccount |
serviceaccount | ServiceAccount creation |
| 98 | pullsecrets |
pullsecrets | Pull secret attachment |
| 98 | servicemesh |
servicemesh | Service mesh annotations |
| 98 | sidecar |
sidecar | OTel collector and token refresher sidecars |
| 99 | confighash |
confighash | Config Secret + hash annotation on Deployments |
Each provider type supports multiple modes configured on ClowdEnvironment.spec.providers.*:
| Provider | Modes |
|---|---|
db |
local (deploy PostgreSQL), app-interface (external secret lookup), shared (schema isolation on shared DB), none |
kafka |
operator (Strimzi), local (ephemeral), app-interface (pass-through), managed (MSK via secret), ephem-msk (ephemeral MSK), none |
objectstore |
minio (deploy Minio), app-interface (S3 credentials from secret), none |
inmemorydb |
redis (deploy Redis), elasticache (credential secret lookup), none |
featureflags |
local (deploy Unleash), app-interface (credential pass-through), none |
web |
none/operator (Services only), local (full Keycloak + BOP stack) |
metrics |
operator (ServiceMonitors + Prometheus), app-interface, none |
logging |
app-interface (CloudWatch credentials), null, none |
reverseproxy |
ephemeral, none |
autoscaler |
enabled (KEDA), none |
The provider init() function selects the correct implementation struct based on the mode value at
setup time.
Clowder uses an in-memory ObjectCache (from github.com/RedHatInsights/rhc-osdk-utils/resourceCache)
rather than issuing Kubernetes API calls directly during provider execution.
Without the cache, each provider would independently Create or Update Kubernetes objects. This
causes redundant reconciliation triggers (each API write fires watch events) and risks race
conditions between providers operating on the same object. The cache batches all writes and applies
them in a single controlled pass after all providers complete.
- At the start of reconciliation, a fresh
ObjectCacheis created. - Providers call
cache.Create(ident, nn, obj)to register a new object, orcache.Update(ident, obj)to update an existing one.cache.Getandcache.Listretrieve objects from the cache without hitting the API. - Objects are keyed by a
ResourceIdent(a typed identifier scoped to a provider and resource name) plus aNamespacedName. - After all providers have run,
cache.ApplyAll()issues the actual KubernetesCreateorUpdatecalls in a defined apply order:*(all other types), thenService,Secret,Deployment,Job,CronJob,ScaledObject. cache.Reconcile(uid, opts)lists existing resources matching owner labels and deletes any that were not written into the cache during this reconciliation pass, removing orphaned resources.
Each resource type is declared as a ResourceIdent at package level in its provider:
var CoreConfigSecret = rc.NewSingleResourceIdent(ProvName, "config_secret", &core.Secret{})The StrictGVK option on the cache configuration causes it to reject registration of resource
types that were not declared in AddPossibleGVKFromIdent at provider setup time, preventing
accidental writes.
All three controllers use a shared custom event handler: enqueueRequestForObjectCustom
(defined in controllers/cloud.redhat.com/handlers.go). This handler is created per
controller via createNewHandler and implements the handler.EventHandler interface.
The handler is responsible for:
- Determining the owning
ClowdApporClowdEnvironmentof an event-generating object via owner references. - Deciding whether to enqueue a reconciliation request based on filtering logic.
- Maintaining the hash cache for ConfigMaps and Secrets.
| Event | Filter logic |
|---|---|
| Create | If the object is a tracked ConfigMap/Secret, update its hash cache entry and enqueue all apps using it. If the object has an owner of the watched kind, call CreateFunc handler to decide if a reconcile is needed. |
| Update | If the object carries the restarter annotation, update hash and re-enqueue owning apps. If the object is already in the hash cache (tracked), call doUpdateToHash. For generation-based resources, UpdateFunc is gated on generation change. |
| Delete | Remove object from hash cache. Enqueue the owning ClowdApp/ClowdEnvironment via DeleteFunc. |
| Generic | Enqueue owning resource via GenericFunc if present. |
The HandlerFuncs struct (with CreateFunc, UpdateFunc, DeleteFunc, GenericFunc) is
constructed per controller and encodes controller-specific rules — for example, gating Deployment
updates on generation change to avoid reconciling on status subresource writes.
The HashCache (controllers/cloud.redhat.com/hashcache/) is a thread-safe
in-memory structure that maps Kubernetes object identities to computed content hashes. Two separate
instances are created — one for ClowdApp and one for ClowdEnvironment — in run.go.
Purpose: When a Secret or ConfigMap that a pod depends on changes content, the hash cache
detects the change and enqueues reconciliation for all ClowdApp or ClowdEnvironment objects
that reference that object. During reconciliation, the confighash provider annotates Deployment
pods with the aggregate hash of all their dependent Secrets/ConfigMaps, causing Kubernetes to
roll out new pods when configuration changes.
Key operations:
CreateOrUpdateObject(obj, alwaysUpdate)— computes a SHA256 hash of the object's content and stores it. Returns true if the hash changed.AddClowdObjectToObject(clowdObj, obj)— records that aClowdApporClowdEnvironmentdepends on the given object.GetSuperHashForClowdObject(clowdObj)— returns an aggregate hash of all objects associated with a givenClowdApporClowdEnvironment.RemoveClowdObjectFromObjects(obj)— called at the start of each reconciliation to clear stale associations.Delete(obj)— removes an object from the cache on delete events.
Objects that carry the restarter annotation (configurable via operator config) are always tracked, regardless of whether a provider explicitly registered them.
The canonical configuration schema lives at schema/schema.json. Go types are
generated from this schema into controllers/cloud.redhat.com/config/types.go via:
make genconfig
This runs gojsonschema against controllers/cloud.redhat.com/config/schema.json (the schema is
also copied there during the build process). The generated AppConfig struct and its sub-types are
the shared data model that all providers write into during reconciliation.
Each provider populates fields on the config.AppConfig struct held in Provider.Config. Because
providers run in order, later providers can read data written by earlier ones. The confighash
provider (order 99, last to run) serializes the completed AppConfig to JSON and writes it as the
value of cdappconfig.json in a Kubernetes Secret:
Secret: {app-name}-cdappconfig (in the app namespace)
Key: cdappconfig.json
Value: <JSON-serialized AppConfig>
The confighash provider also computes the aggregate hash of all dependent Secrets and ConfigMaps
and annotates all managed Deployments with this hash, triggering rolling updates on config changes.
The generated cdappconfig.json Secret is mounted into every pod managed by the ClowdApp. The
mount path is determined by the ACG_CONFIG environment variable convention. Applications use a
language-specific client library (e.g., app-common-go) to parse this file at startup.
A ClowdApp declares dependencies on other apps via spec.dependencies (required) and
spec.optionalDependencies (present when available). Both are lists of ClowdApp names within
the same ClowdEnvironment.
The dependencies provider (order 4) resolves endpoints as follows:
- Lists all
ClowdAppresources in the same environment viaenv.GetAppsInEnv(). - Also lists
ClowdAppRefresources (lightweight references for apps not fully managed by Clowder). - For each declared dependency, finds the matching app and extracts its public and private web service endpoints.
- Populates
config.AppConfig.Endpoints(public) andconfig.AppConfig.PrivateEndpointsfor the app being reconciled.
Resolution always includes the app's own endpoints first (self-registration), then external dependencies.
Each resolved endpoint carries:
hostname— the Kubernetes Service DNS nameport— the public web porttlsPort— the TLS port, if TLS is enabled for that deploymentapp— the source app namename— the deployment name within the source app
Private endpoints (PrivateDependencyEndpoint) similarly carry hostname, port, tlsPort, and
app/name identifiers.
TLS configuration is per-endpoint, not global. Each Deployment within a ClowdApp can
individually enable or disable TLS for its public and private services via
webServices.public.tls and webServices.private.tls, overriding the environment-level TLS
default. Dependent apps receive the CA certificate path for each individual endpoint in
cdappconfig.json, enabling mutual TLS with granular trust boundaries.
The web provider is selected based on ClowdEnvironment.spec.providers.web.mode:
| Mode | Implementation | Description |
|---|---|---|
none / operator |
webProvider (default.go) |
Creates Kubernetes Services for each enabled web service. No gateway deployed. |
local |
localWebProvider (local.go) |
Deploys a full local auth stack: Keycloak, mock BOP, mock entitlements, and a Caddy gateway. |
In local mode a Caddy-based reverse proxy gateway is deployed in the environment's
targetNamespace. It reads routing configuration from a generated ConfigMap and terminates TLS
at the gateway boundary. Gateway certificate modes are configurable:
self-signed— Caddy generates a self-signed certificate.acme— Caddy uses Let's Encrypt ACME with a configured email address.none— no gateway TLS.
The gateway ConfigMap is assembled from all registered ClowdApp API paths and updated on every relevant reconciliation.
HTTP/2 cleartext (H2C) is supported at the per-deployment level via webServices.public.h2cEnabled
and webServices.private.h2cEnabled. When enabled, the web provider creates an additional Service
port on the environment's h2cPort / h2cPrivatePort. The target port for H2C services can be
overridden per deployment via h2cTargetPort.
Generated Services follow a deterministic naming pattern:
- Public service:
{app-name}-{deployment-name} - Private service:
{app-name}-{deployment-name}-private
This pattern is also used by the dependency resolution provider when constructing endpoint hostnames.
The metrics provider manages Prometheus integration. In operator mode it creates:
- A
Prometheuscustom resource (via the Prometheus Operator) in the environment's target namespace. - A
ServiceAccount,Role, andRoleBindingfor the Prometheus instance. - A
ServiceMonitorfor each app that has metrics enabled. - Optionally, a Prometheus Pushgateway deployment and associated
ServiceMonitor.
In app-interface mode, ServiceMonitor resources are still created but the Prometheus instance
itself is not deployed (it is assumed to exist externally).
The Prometheus server address is written to ClowdEnvironment.status.prometheus.serverAddress for
consumption by other components.
The sidecar provider (order 98) can inject an OpenTelemetry collector sidecar into every pod
managed by a ClowdApp. This is configured environment-wide via
ClowdEnvironment.spec.providers.sidecars.otelCollector.enabled, with per-app override capability
via ClowdApp.spec.deployments[].podSpec.sidecars[].
The OTel collector sidecar:
- Uses a configurable image (operator config → environment spec → app spec override, in priority order).
- Reads its pipeline configuration from a per-app ConfigMap (default name:
{app-name}-otel-config), which can be overridden at environment or app level. - Supports per-app environment variable injection and memory resource limits.
A token refresher sidecar (tokenRefresher) is also available under the same provider, used for
service account token renewal in environments requiring short-lived credentials.
The operator itself exposes Prometheus metrics (registered in the controller setup code) tracking:
- Per-provider reconciliation duration (
providerMetrics), labeled by provider name and source (clowdapporclowdenvironment). - Reconciliation counts and durations per controller.
Three separate generation steps produce different artifacts. They are independent and must be run in the correct sequence when both API types and configuration schema change.
Runs: controller-gen object:headerFile=...
Produces: apis/cloud.redhat.com/v1alpha1/zz_generated.deepcopy.go
Why: Kubebuilder/controller-runtime requires all types stored in the API server to implement
runtime.Object, which in practice means implementing DeepCopyObject(). The code generator reads
+kubebuilder:object:root=true markers on type definitions and generates the full DeepCopy method
tree. This must be re-run whenever a field is added, removed, or changed in any CRD type file.
Runs: controller-gen crd rbac:roleName=... webhook
Produces:
config/crd/bases/— CRD YAML manifests (one per CRD)config/rbac/— ClusterRole YAML derived from+kubebuilder:rbacmarkers in controller files
Why: The CRD YAML must match the Go type definitions exactly. Kubebuilder markers embedded in
the type files (e.g., +kubebuilder:validation:Enum, +kubebuilder:printcolumn) control
validation, printer columns, and scope. The RBAC manifests ensure the operator's service account
has exactly the permissions its controllers declare via +kubebuilder:rbac comments.
Runs: gojsonschema -p config -o types.go schema.json
Source: controllers/cloud.redhat.com/config/schema.json (canonical source: schema/schema.json)
Produces: controllers/cloud.redhat.com/config/types.go
Why: The cdappconfig.json document consumed by application containers has a versioned JSON
Schema. Generating Go types from the schema ensures the operator's internal AppConfig struct
stays in sync with the published schema without manual maintenance. The schema is the source of
truth; the Go types are derived artifacts. This must be re-run whenever the config schema changes.
When making changes that span both CRD types and config schema:
- Edit type files in
apis/cloud.redhat.com/v1alpha1/. - Run
make generateto regenerate DeepCopy. - Run
make manifeststo regenerate CRD YAML and RBAC. - Edit
schema/schema.jsonif configuration fields changed. - Run
make genconfigto regenerateconfig/types.go.
Running make manifests before make generate will produce inconsistent CRD manifests if new
types lack DeepCopy implementations.