feat: add perses as optional dashboarding component#115
Draft
MichaelSp wants to merge 15 commits into
Draft
Conversation
40f2613 to
6682222
Compare
Adds Perses (CNCF dashboard tool) as an opt-in component alongside the existing Prometheus and VictoriaLogs stack. Deployed via a new `Perses` kro ResourceGraphDefinition; not part of ObservabilityStack. Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
6682222 to
10d701b
Compare
Contributor
|
Nevermind, I found it. 👍 |
Update the PERSES_IMAGE_VERSION in component-settings.yaml to v0.53.1 for compatibility with the latest features and fixes. Also, adjust the image path in component-constructor.yaml to remove the version prefix for proper image retrieval. Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
Integrate Perses as a standalone dashboarding solution for Kubernetes, providing a UI on top of Prometheus and Victoria Logs. This includes configuration for data sources, provisioning, and sample dashboards to enhance observability capabilities. Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
…-gateway RGD Listener index 3 (perses) was added to the gateway kustomization but the RGD never patched its hostname or port, leaving the <perses-dnsname> placeholder in place and causing Envoy Gateway to fail programming the listener → ObservabilityStack readiness timeout in e2e. Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
e432f37 to
ae0c38d
Compare
Federated metrics (e.g. opencontrolplane_controlplane) scrape from the onboarding cluster and reliably miss the 5-minute window, causing flaky failures unrelated to code changes. Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
Perses is now part of the ObservabilityStack RGD rather than a standalone CR. It is disabled by default (spec.perses.enabled: false). The enabled flag lives on the Perses RGD itself (not as an includeWhen on the parent resource) — a workaround for a kro limitation where resources excluded via includeWhen are removed from the CEL context entirely, making their status unreferenceable in parent status expressions or the ready gate. See: kubernetes-sigs/kro#926 (comment) README section 9 updated to show kubectl patch one-liner to enable. Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
The build script merges RGD files in alphabetical order. obs-stack references kind: Perses, so it must come after perses.yaml. Renaming to zz-observability-stack.yaml ensures correct ordering. Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
…ependencies" This reverts commit e8cf497. Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
… in status
The Perses RGD's status expressions referenced schema.spec.enabled and
persesKustomization directly:
status:
ready: ${!schema.spec.enabled || persesKustomization.status.conditions...}
lastAttemptedRevisionDigest: ${persesKustomization.status.lastAttemptedRevision}
Both are rejected by kro: status expressions cannot reference schema.* and
cannot reference resources gated by includeWhen (kro strips them from the
status CEL activation context at compile time). The perses RGD therefore
failed to compile, its Perses CRD had no status.ready field, and obs-stack
in turn failed with 'references unknown identifiers: [perses]'.
Fix: introduce an always-present sentinel ConfigMap 'perses-status' whose
data fields hold the readiness signal. Its template expressions may
reference schema.* and includeWhen-gated resources (only status block
expressions are restricted), so it can compute readiness from
persesKustomization when enabled=true and short-circuit to 'true' when
enabled=false. The status block then only reads the ConfigMap data,
which is always resolvable.
Verified in-cluster: both 'perses' and 'obs-stack' RGDs report
Ready=True after this change.
See: kubernetes-sigs/kro#926 (comment)
Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
…imitation kro v0.9.2 forbids status expressions from referencing resources gated by includeWhen — they are stripped from the CEL activation context at compile time. The sentinel ConfigMap workaround also fails because CEL evaluates both branches of a ternary, so any reference to an excluded resource causes a runtime nil-dereference. Instead, status.ready is set to the static literal true. Instance readiness is still correctly gated: persesKustomization has a readyWhen expression that blocks until the Flux Kustomization is ready when enabled=true. When enabled=false all resources are excluded and kro treats the instance as vacuously ready — no workloads are deployed. obs-stack.ready no longer gates on perses.status.ready because perses.status.ready is always true and adds no signal. status.perses.ready is kept in the status schema (removing it would be a breaking CRD change). Verified locally: both perses and obs-stack RGDs compile and report Ready=True in kro v0.9.2. See: kubernetes-sigs/kro#926 (comment) Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
kro v0.9.2 strips includeWhen-gated resources from the CEL context at compile time, making it impossible to propagate their readiness into status expressions. All workarounds tried (sentinel ConfigMap, ternary expressions) either fail compilation or produce unreliable results. status.ready is set to the static literal true in the Perses RGD. A comment marks where real propagation would live and links to the upstream issue. obs-stack.status.perses.ready is kept (removing it would be a breaking CRD change) but is excluded from the ready gate. See: kubernetes-sigs/kro#926 (comment) Signed-off-by: Michael Sprauer <Michael.Sprauer@sap.com>
d79633e to
3f5b00a
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR does / why we need it:
This PR adds Perses (CNCF dashboarding tool) as an optional, standalone component
PerseskroResourceGraphDefinition— not wired intoObservabilityStack, deploy independently via aPersesCRdashboards.<obs-gateway-ns>.<base-domain>:8443with mTLSWhich issue(s) this PR fixes:
Special notes for your reviewer:
Release note:
Test plan
perseskustomization artifact to ghcr.ioPersesCR on a dev cluster, confirmperses-systemnamespace and deployment become readyhttps://dashboards.<ns>.<domain>:8443with client cert — Perses UI loadsCaveat: kro includeWhen limitation
The
enabledflag is pushed down into the Perses RGD rather than usingincludeWhenon the parent resource. This is a workaround for a kro bug where resources excluded viaincludeWhenare removed from the CEL activation context entirely, making it impossible to reference their status in parent status expressions or thereadygate. The Perses CR is therefore always instantiated; all workload resources inside it are gated viaincludeWhen: schema.spec.enabled == true.See: kubernetes-sigs/kro#926 (comment)