Skip to content

[rancher-monitoring 69.8-rancher.27] Fix issues with metricRelabelings in exporters#156

Open
belgaied2 wants to merge 6 commits into
rancher:mainfrom
belgaied2:main
Open

[rancher-monitoring 69.8-rancher.27] Fix issues with metricRelabelings in exporters#156
belgaied2 wants to merge 6 commits into
rancher:mainfrom
belgaied2:main

Conversation

@belgaied2

Copy link
Copy Markdown

Issue:

rancher/rancher#43392

Solution

The metricRelabelings attribute in ServiceMonitor takes in a list of maps coming from the values.yaml.
In order to template that in helm, the previous implementation did the following :\

  {{ tpl (toYaml .Values.XXX.metricRelabelings ) . | indent 4 }}

This does not work with multiline relabelings, such as:

  - action: keep
    sourceLabels: [__name__]
    regex: <some_regex>

which would generate:

  metricRelabelings:
      - action: keep
    sourceLabels: [__name__]
    regex: <some_regex>

AND throw an error!

We need to remove first space characters using {{- tpl ... and do multiline indentation using nindent instead of indent.

That's what this PR does for each location of metricRelabelings in versions 106 and 107 of the rancher-monitoring chart.

Commit messages:

  • Fixed issue with metricRelabelings in ingress-nginx exporter overlay
  • Fixed metricRelabelings in patches for multiple exporters, including core-dns, kube-controller-manager, etc.
  • make charts

@susesamu susesamu self-requested a review October 15, 2025 15:42
@susesamu susesamu changed the title [rancher-monitoring 69.8] Fix issues with metricRelabelings in exporters [rancher-monitoring 69.8-rancher.26] Fix issues with metricRelabelings in exporters Oct 15, 2025
belgaied2 and others added 4 commits October 24, 2025 14:58
Signed-off-by: Mohamed Belgaied Hassine <belgaied2@hotmail.com>
…core-dns, kube-controller-manager, etc.

Signed-off-by: Mohamed Belgaied Hassine <belgaied2@hotmail.com>
@susesamu susesamu changed the title [rancher-monitoring 69.8-rancher.26] Fix issues with metricRelabelings in exporters [rancher-monitoring 69.8-rancher.27] Fix issues with metricRelabelings in exporters Oct 24, 2025
@susesamu

Copy link
Copy Markdown
Contributor

holding merge until 2.12.4 - this is going to be a backport since its already fixed on 2.13

@deepakpunia-suse

deepakpunia-suse commented Oct 29, 2025

Copy link
Copy Markdown

I tried to verify the fix for this bug on the specified chart version. Observed that chart installation is failing with the error Invalid value: "object". This occurs when the metricRelabelings parameter is used and ingressNginx is set to true.

Environment:

  • Upstream Cluster:
    • Rancher: v2.12-head
    • RKE2: v1.32.9+rke2r1
    • rancher-monitoring: 69.8.2-rancher.27

ob-team-chart repo:
URL : https://github.com/belgaied2/ob-team-charts
branch: main

  1. Tried to install the above-mentioned monitoring chart via the UI with the following ingressNginx configuration:
ingressNginx:
  enabled: true
  namespace: ingress-nginx
  service:
    port: 9913
    targetPort: 10254
  serviceMonitor:
    interval: 30s
    metricRelabelings:
      - action: drop
        regex: kube_(daemonset|deployment|pod|namespace|node|statefulset).+
        sourceLabels:
          - __name__
    proxyUrl: ''
    relabelings: []
  1. The installation fails with the error: Error: failed to create resource: ServiceMonitor.monitoring.coreos.com "rancher-monitoring-ingress-nginx" is invalid: [spec.endpoints[0].metricRelabelings[0].sourceLabels[1]: Invalid value: "object"

Attached below is a screenshot.

Image

@skanakal

Copy link
Copy Markdown
Contributor

@susesamu Can we merge this?

@susesamu

Copy link
Copy Markdown
Contributor

@skanakal no.. we still need to fix the bug that Deepak found here rancher/rancher#43392 (comment). @belgaied2 is taking a look.

@belgaied2

Copy link
Copy Markdown
Author

@deepakpunia-suse I am a bit confused about your findings, I just did the following:

  • I have a Rancher in version v2.12.2
  • Created an RKE2 Cluster on AWS (v1.33.5+rke2r1) using the Node Driver
  • Took the KUBECONFIG from Rancher
  • Downloaded the following helm chart files in TGZ format: CRD chart and Main Chart, both in version 69.8.2-rancher.27.
  • Created the file values-qa.yaml with the following content (same as you mentioned above):
ingressNginx:
  enabled: true
  namespace: ingress-nginx
  service:
    port: 9913
    targetPort: 10254
  serviceMonitor:
    interval: 30s
    metricRelabelings:
      - action: drop
        regex: kube_(daemonset|deployment|pod|namespace|node|statefulset).+
        sourceLabels:
          - __name__
    proxyUrl: ''
    relabelings: []
  • Installed first the CRD chart :
$ helm install -f values-qa.yaml rancher-monitoring-crd  rancher-monitoring-crd-69.8.2-rancher.27.tgz
NAME: rancher-monitoring-crd
LAST DEPLOYED: Tue Dec  2 13:44:09 2025
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
  • Created the namespaces ingress-nginx and cattle-monitoring-system
  • Deployed the main chart:
$ helm upgrade --install -f values-qa.yaml rancher-monitoring --create-namespace -n cattle-monitoring-system  rancher-monitoring-69.8.2-rancher.27.tgz
Release "rancher-monitoring" has been upgraded. Happy Helming!
NAME: rancher-monitoring
LAST DEPLOYED: Tue Dec  2 13:48:56 2025
NAMESPACE: cattle-monitoring-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
rancher-monitoring has been installed. Check its status by running:
  kubectl --namespace cattle-monitoring-system get pods -l "release=rancher-monitoring"

Get Grafana 'admin' user password by running:

  kubectl --namespace cattle-monitoring-system get secrets rancher-monitoring-grafana -o jsonpath="{.data.admin-password}" | base64 -d ; echo

Access Grafana local instance:

  export POD_NAME=$(kubectl --namespace cattle-monitoring-system get pod -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=rancher-monitoring" -oname)
  kubectl --namespace cattle-monitoring-system port-forward $POD_NAME 3000

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.
  • Checked the ServiceMonitor resource:
$ k get smon -n ingress-nginx rancher-monitoring-ingress-nginx -oyaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  annotations:
    meta.helm.sh/release-name: rancher-monitoring
    meta.helm.sh/release-namespace: cattle-monitoring-system
  creationTimestamp: "2025-12-02T12:51:57Z"
  generation: 1
  labels:
    app: rancher-monitoring-ingress-nginx
    app.kubernetes.io/instance: rancher-monitoring
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/part-of: rancher-monitoring
    app.kubernetes.io/version: v0.80.1
    chart: rancher-monitoring-69.8.2-rancher.27
    helm.sh/chart: rancher-monitoring-69.8.2-rancher.27
    heritage: Helm
    release: rancher-monitoring
  name: rancher-monitoring-ingress-nginx
  namespace: ingress-nginx
  resourceVersion: "24653"
  uid: 4ec941de-09b3-474f-a742-59891ba80c66
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    metricRelabelings:
    - action: drop
      regex: kube_(daemonset|deployment|pod|namespace|node|statefulset).+
      sourceLabels:
      - __name__
    port: http-metrics
  jobLabel: jobLabel
  namespaceSelector:
    matchNames:
    - ingress-nginx
  selector:
    matchLabels:
      app: rancher-monitoring-ingress-nginx
      release: rancher-monitoring

So, everything seems to work for me. Can you please check ?

@deepakpunia-suse

Copy link
Copy Markdown

Details:
Cluster: v1.32.10+rke2r1
Rancher: v2.12-head
Monitoring: 69.8.2-rancher.27

@belgaied2 It is still failing for me with the same error mentioned above. I noticed that you installed it using Helm, but it is failing specifically through the UI installation.

Below are the exact steps and a screenshot of the process I followed using the UI. If you give this a try, you will likely face the same error.

Screenshot_20251205_125343 Screenshot_20251205_125030 Screenshot_20251205_125148 Screenshot_20251205_125309

…ding 2 spaces

Signed-off-by: Mohamed Belgaied Hassine <belgaied2@hotmail.com>
Signed-off-by: Mohamed Belgaied Hassine <belgaied2@hotmail.com>
@belgaied2

Copy link
Copy Markdown
Author

After further investigation, indeed there is a problem when the chart is installed with Rancher because of the following block:

    {{ if .Values.global.cattle.clusterId }}
      - sourceLabels: [__address__]
        targetLabel: cluster_id
        replacement: {{ .Values.global.cattle.clusterId }}
    {{- end }}
    {{ if .Values.global.cattle.clusterName}}
      - sourceLabels: [__address__]
        targetLabel: cluster_name
        replacement: {{ .Values.global.cattle.clusterName }}
    {{- end }}

where the sourceLabels have 6 spaces indentation and not 4, which works if there is no clusterId .

The fix, is to change the indentation to 6 instead of 4 and it works.
I updated the templates accordingly.

@deepakpunia-suse please check again against the latest version.

Thanks!

@deepakpunia-suse

Copy link
Copy Markdown

@belgaied2 We can see that the first installation attempt is failing with the error: Failed to create resources : namespace "Ingress-nginx" not found. However, after creating the namespace manually, it installed successfully.

Ideally, the UI installation should not require the customer to create a namespace manually. We need to add logic to create the namespace automatically if it does not already exist.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants