feat: add Loki-based admin usage dashboard with Perses datasource#995
feat: add Loki-based admin usage dashboard with Perses datasource#995tgitelman wants to merge 2 commits into
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: tgitelman The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Central YAML (base), Organization UI (inherited) Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| app.kubernetes.io/managed-by: maas-observability | ||
| app.kubernetes.io/component: perses | ||
| annotations: | ||
| service.beta.openshift.io/inject-cabundle: "true" |
There was a problem hiding this comment.
did you check if we can enable unencrypted communication from within the cluster? I saw we have unencrypted communication with the Prometheus instance of ODH - I think it makes sense here as well, if possible
There was a problem hiding this comment.
LokiStack gateway in openshift-logging mode enforces HTTPS exclusively (PR #6288). There's no HTTP port to connect to. This matches the Prometheus/Thanos Querier pattern — the ODH operator's own cluster-prometheus-datasource also uses HTTPS + service CA + bearer token to Thanos Querier (same pattern we follow here for Loki). The CA ConfigMap + TLS config are required.
There was a problem hiding this comment.
This matches the Prometheus/Thanos Querier pattern — the ODH operator's own cluster-prometheus-datasource also uses HTTPS + service CA + bearer token to Thanos Querier (same pattern we follow here for Loki)
the Prometheus instance I referred to is not the cluster-prometheus-datasource but the one that is specific to ODH
| proxy: | ||
| kind: HTTPProxy | ||
| spec: | ||
| url: https://lokistack-sample-gateway-http.logging4.svc.cluster.local:8080/api/logs/v1/application |
There was a problem hiding this comment.
this should not rely on the namespace being logging4
| @@ -0,0 +1,18 @@ | |||
| # Grant Perses SA read access to application logs via LokiStack gateway. | |||
There was a problem hiding this comment.
this probably needs to move to the opendatahub-operator instead
| app.kubernetes.io/managed-by: maas-observability | ||
| app.kubernetes.io/component: perses | ||
| annotations: | ||
| kubernetes.io/service-account.name: loki-query-proxy |
There was a problem hiding this comment.
I guess we don't need this here, right?
| @@ -0,0 +1,36 @@ | |||
| # Admin Loki datasource — direct to LokiStack, no user filtering. | |||
There was a problem hiding this comment.
the datasource needs to be placed in observability/dashboards folder, next to the dashboard
| kind: StaticListVariable | ||
| spec: | ||
| values: | ||
| - "dev-subscription-a" |
There was a problem hiding this comment.
version 1.5 of cluster observability operator is expected later this week, we need to test their fix for dynamic values in those lists
…io#2 — move datasource, fix LokiStack URL - Move loki-datasource.yaml to observability/dashboards/ next to its dashboard (follows Prometheus datasource co-location convention) - Change hardcoded LokiStack URL from lokistack-sample/logging4 to logging-loki/openshift-logging (standard namespace) - Update loki/ and dashboards/ kustomization references accordingly
Add admin usage dashboard querying Loki structured logs via LogQL, with supporting Loki datasource infrastructure (CA, RBAC, SA token secret).
17576cd to
2d63a11
Compare
|
@tgitelman: The following test has Failed: OCI Artifact Browser URLInspecting Test Artifacts ManuallyTo inspect your test artifacts manually, follow these steps:
mkdir -p oras-artifacts
cd oras-artifacts
oras pull quay.io/opendatahub/odh-ci-artifacts:maas-group-test-758wt |
…io#2 — move datasource, fix LokiStack URL - Move loki-datasource.yaml to observability/dashboards/ next to its dashboard (follows Prometheus datasource co-location convention) - Change hardcoded LokiStack URL from lokistack-sample/logging4 to logging-loki/openshift-logging (standard namespace) - Update loki/ and dashboards/ kustomization references accordingly
Summary
Add admin usage dashboard (usage-admin-loki-dashboard) that queries Loki structured logs via LogQL, showing token consumption, request counts, rate limiting, success rate, and active users
Add Loki datasource infrastructure: PersesDatasource CR, TLS CA ConfigMap, RBAC ClusterRoleBinding, and SA token Secret
Wire into existing observability/dashboards kustomization alongside the Prometheus-based dashboards
Details
The dashboard provides an admin-level view of MaaS API usage by querying structured metadata from Envoy access logs stored in Loki (via OTel Collector). It complements the existing Prometheus-based dashboards with per-request granularity including user_id, tokens_total, response_code, and model.
Dashboard panels:
Total Tokens, Total Requests, Total Rate Limited, Success Rate, Active Users (stat panels)
Token consumption over time (time series chart with "View by" model/subscription selector)
Usage breakdown table (model × subscription with token and request counts)
Loki infra (deployment/components/observability/loki/):
loki-datasource.yaml — PersesDatasource pointing to LokiStack gateway with SA token + TLS
loki-ca.yaml — ConfigMap with service CA injection for TLS
loki-rbac.yaml — ClusterRoleBinding granting perses-sa the cluster-logging-application-view role
loki-secret.yaml — SA token secret for LokiStack gateway authentication
Prerequisites
LokiStack deployed and receiving structured logs from the Envoy OTel access logger
Cluster Observability Operator (COO) with Perses enabled
OTel Collector forwarding Envoy access logs to Loki via OTLP
Test plan
Deploy with kubectl apply -k deployment/components/observability/observability/dashboards/
Verify dashboard appears in OpenShift Console under Observe > Dashboards
Confirm all panels render data (Total Tokens, Requests, Rate Limited, Success Rate, Active Users)
Verify "View by" selector switches token chart grouping between model and subscription
Confirm usage breakdown table shows model × subscription detail