feat: Ship optional default Grafana dashboard via Helm#3445
feat: Ship optional default Grafana dashboard via Helm#3445sameerdattav wants to merge 1 commit intokubeflow:masterfrom
Conversation
Signed-off-by: Surya Sameer Datta Vaddadi <f20220373@goa.bits-pilani.ac.in>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
There was a problem hiding this comment.
Pull request overview
Adds an optional, Helm-gated Grafana dashboard for Kubeflow Trainer by packaging a dashboard JSON into a ConfigMap (disabled by default).
Changes:
- Introduces
grafanaDashboard.*Helm values to gate/label/annotate a dashboard ConfigMap. - Adds a
grafana-dashboard-configmap.yamltemplate that embeds a dashboard JSON via.Files.Get. - Updates chart docs to describe enabling the dashboard.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| charts/kubeflow-trainer/values.yaml | Adds grafanaDashboard values to control/label/annotate optional dashboard installation. |
| charts/kubeflow-trainer/templates/grafana-dashboard-configmap.yaml | New conditional ConfigMap template that loads the dashboard JSON from the chart files. |
| charts/kubeflow-trainer/dashboards/kubeflow-trainer-dashboard.json | Adds the default Grafana dashboard definition (PromQL panels + variables). |
| charts/kubeflow-trainer/README.md.gotmpl | Documents the optional dashboard and how to enable it. |
| charts/kubeflow-trainer/README.md | Generated README update reflecting the new values and enablement docs. |
| helm install kubeflow-trainer oci://ghcr.io/kubeflow/charts/kubeflow-trainer \ | ||
| --version 2.1.0 \ | ||
| --set grafanaDashboard.enabled=true | ||
| ``` |
There was a problem hiding this comment.
The Helm install snippet pins --version 2.1.0, but the chart version in this repo is 2.2.0; this will mislead users and should either omit --version or be regenerated from the templated README source.
| {{- if .Values.grafanaDashboard.enabled }} | ||
| apiVersion: v1 | ||
| kind: ConfigMap | ||
| metadata: | ||
| name: {{ include "trainer.fullname" . }}-grafana-dashboard |
There was a problem hiding this comment.
Add helm-unittest coverage for this new conditional template (at least: not rendered when grafanaDashboard.enabled=false, rendered when true, and that custom labels/annotations are applied) since the chart already uses helm-unittest tests for other templates.
|
|
||
| ```bash | ||
| helm install kubeflow-trainer oci://ghcr.io/kubeflow/charts/kubeflow-trainer \ | ||
| --version 2.1.0 \ |
There was a problem hiding this comment.
The Helm install snippet pins --version 2.1.0, but the chart version in this repo is 2.2.0; this will mislead users and should either omit --version or template it from the chart version and regenerate README.md.
| --version 2.1.0 \ | |
| --version {{ .Version }} \ |
Summary
Adds an optional Grafana dashboard for Kubeflow Trainer, delivered via Helm as a ConfigMap. Disabled by default, so clusters without Grafana remain unaffected.
Motivation
Provide quick, standardized visibility into controller health and TrainJob activity without requiring custom dashboards.
Changes
Added dashboard JSON (
kubeflow-trainer-dashboard.json)Added gated ConfigMap template using
.Files.GetIntroduced Helm values:
grafanaDashboard.enabled(default:false)grafanaDashboard.labelsgrafanaDashboard.annotationsUpdated chart documentation
Enable
Coverage
Limitations
Uses only existing metrics. Native TrainJob lifecycle metrics can be added in a follow-up.
Testing
Gated via Helm values. Rendering expected to be validated in CI.
Fixes: #3430
cc: @andreyvelich , @abhijeet-dhumal