Skip to content

feat: add perses as optional dashboarding component#115

Draft
MichaelSp wants to merge 15 commits into
mainfrom
feat/add-perses
Draft

feat: add perses as optional dashboarding component#115
MichaelSp wants to merge 15 commits into
mainfrom
feat/add-perses

Conversation

@MichaelSp

@MichaelSp MichaelSp commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

image

This PR adds Perses (CNCF dashboarding tool) as an optional, standalone component

  • New Perses kro ResourceGraphDefinition — not wired into ObservabilityStack, deploy independently via a Perses CR
  • Connects to Prometheus and VictoriaLogs via in-cluster ClusterIP (no mTLS needed inside cluster)
  • Exposed via the existing observability Envoy Gateway at dashboards.<obs-gateway-ns>.<base-domain>:8443 with mTLS

Which issue(s) this PR fixes:

Special notes for your reviewer:

Release note:

adds https://perses.dev (CNCF dashboarding tool) as an optional, standalone component

Test plan

  • PR build pushes perses kustomization artifact to ghcr.io
  • Apply Perses CR on a dev cluster, confirm perses-system namespace and deployment become ready
  • Hit https://dashboards.<ns>.<domain>:8443 with client cert — Perses UI loads
  • Verify Prometheus datasource query returns data in Perses UI
  • Existing e2e tests pass

Caveat: kro includeWhen limitation

The enabled flag is pushed down into the Perses RGD rather than using includeWhen on the parent resource. This is a workaround for a kro bug where resources excluded via includeWhen are removed from the CEL activation context entirely, making it impossible to reference their status in parent status expressions or the ready gate. The Perses CR is therefore always instantiated; all workload resources inside it are gated via includeWhen: schema.spec.enabled == true.

See: kubernetes-sigs/kro#926 (comment)

@MichaelSp MichaelSp force-pushed the feat/add-perses branch 2 times, most recently from 40f2613 to 6682222 Compare June 24, 2026 14:29
Adds Perses (CNCF dashboard tool) as an opt-in component alongside the
existing Prometheus and VictoriaLogs stack. Deployed via a new `Perses`
kro ResourceGraphDefinition; not part of ObservabilityStack.

Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
@reshnm

reshnm commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@MichaelSp_ Could you explain how the connection between the perses dashboard and other components of the observability stack are being made?
This currently looks like the dashboard is being deployed but without any configuration that would ingest OTEL data points.

Nevermind, I found it. 👍

MichaelSp and others added 6 commits June 26, 2026 16:15
Update the PERSES_IMAGE_VERSION in component-settings.yaml to
v0.53.1 for compatibility with the latest features and fixes.
Also, adjust the image path in component-constructor.yaml to
remove the version prefix for proper image retrieval.

Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
Integrate Perses as a standalone dashboarding solution for Kubernetes,
providing a UI on top of Prometheus and Victoria Logs. This includes
configuration for data sources, provisioning, and sample dashboards
to enhance observability capabilities.

Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
…-gateway RGD

Listener index 3 (perses) was added to the gateway kustomization but the
RGD never patched its hostname or port, leaving the <perses-dnsname>
placeholder in place and causing Envoy Gateway to fail programming the
listener → ObservabilityStack readiness timeout in e2e.

Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
MichaelSp and others added 8 commits June 30, 2026 14:31
Federated metrics (e.g. opencontrolplane_controlplane) scrape from the
onboarding cluster and reliably miss the 5-minute window, causing flaky
failures unrelated to code changes.

Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
Perses is now part of the ObservabilityStack RGD rather than a standalone
CR. It is disabled by default (spec.perses.enabled: false).

The enabled flag lives on the Perses RGD itself (not as an includeWhen on
the parent resource) — a workaround for a kro limitation where resources
excluded via includeWhen are removed from the CEL context entirely, making
their status unreferenceable in parent status expressions or the ready gate.
See: kubernetes-sigs/kro#926 (comment)

README section 9 updated to show kubectl patch one-liner to enable.

Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
The build script merges RGD files in alphabetical order. obs-stack
references kind: Perses, so it must come after perses.yaml. Renaming
to zz-observability-stack.yaml ensures correct ordering.

Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
…ependencies"

This reverts commit e8cf497.

Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
… in status

The Perses RGD's status expressions referenced schema.spec.enabled and
persesKustomization directly:

  status:
    ready: ${!schema.spec.enabled || persesKustomization.status.conditions...}
    lastAttemptedRevisionDigest: ${persesKustomization.status.lastAttemptedRevision}

Both are rejected by kro: status expressions cannot reference schema.* and
cannot reference resources gated by includeWhen (kro strips them from the
status CEL activation context at compile time). The perses RGD therefore
failed to compile, its Perses CRD had no status.ready field, and obs-stack
in turn failed with 'references unknown identifiers: [perses]'.

Fix: introduce an always-present sentinel ConfigMap 'perses-status' whose
data fields hold the readiness signal. Its template expressions may
reference schema.* and includeWhen-gated resources (only status block
expressions are restricted), so it can compute readiness from
persesKustomization when enabled=true and short-circuit to 'true' when
enabled=false. The status block then only reads the ConfigMap data,
which is always resolvable.

Verified in-cluster: both 'perses' and 'obs-stack' RGDs report
Ready=True after this change.

See: kubernetes-sigs/kro#926 (comment)
Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
…imitation

kro v0.9.2 forbids status expressions from referencing resources gated by
includeWhen — they are stripped from the CEL activation context at compile
time. The sentinel ConfigMap workaround also fails because CEL evaluates
both branches of a ternary, so any reference to an excluded resource
causes a runtime nil-dereference.

Instead, status.ready is set to the static literal true. Instance
readiness is still correctly gated: persesKustomization has a readyWhen
expression that blocks until the Flux Kustomization is ready when
enabled=true. When enabled=false all resources are excluded and kro
treats the instance as vacuously ready — no workloads are deployed.

obs-stack.ready no longer gates on perses.status.ready because
perses.status.ready is always true and adds no signal. status.perses.ready
is kept in the status schema (removing it would be a breaking CRD change).

Verified locally: both perses and obs-stack RGDs compile and report
Ready=True in kro v0.9.2.

See: kubernetes-sigs/kro#926 (comment)
Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
kro v0.9.2 strips includeWhen-gated resources from the CEL context at
compile time, making it impossible to propagate their readiness into
status expressions. All workarounds tried (sentinel ConfigMap, ternary
expressions) either fail compilation or produce unreliable results.

status.ready is set to the static literal true in the Perses RGD.
A comment marks where real propagation would live and links to the
upstream issue. obs-stack.status.perses.ready is kept (removing it
would be a breaking CRD change) but is excluded from the ready gate.

See: kubernetes-sigs/kro#926 (comment)
Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants