Describe the bug
Summary
When controller.metricsConfig.enabled is set to false, the argo-workflows Helm chart does not add a metricsConfig block to the workflow controller ConfigMap. The controller then falls back to its binary defaults, which include the Prometheus metrics server enabled (and often with TLS). This causes problems when using OTLP-only metrics with a default PodMonitor that scrapes pods with the sidecar label.
Environment
- Chart: argo-workflows (e.g. 0.47.x)
- Controller: Argo Workflows v3.7.x
- Setup: Controller configured for OTLP push to an injected OpenTelemetry sidecar; Prometheus scrapes via PodMonitor that selects pods with
sidecar.opentelemetry.io/injected: Exists
Current Chart Behavior
From workflow-controller-config-map.yaml:
{{- if .Values.controller.metricsConfig.enabled }}
metricsConfig:
enabled: {{ .Values.controller.metricsConfig.enabled }}
path: {{ .Values.controller.metricsConfig.path }}
port: {{ .Values.controller.metricsConfig.port }}
# ... secure, etc.
{{- end }}
When metricsConfig.enabled is false:
- The
{{- if }} condition is false, so the entire metricsConfig block is omitted from the ConfigMap.
- The workflow controller reads the ConfigMap and finds no
metricsConfig key.
- The controller falls back to its built-in defaults, which typically include:
enabled: true (Prometheus metrics server on)
secure: true (TLS on port 9090)
port: 9090
So setting metricsConfig.enabled: false in values does not result in the controller actually disabling its Prometheus server.
Problem When Using OTLP + Default PodMonitor
Desired flow: Controller pushes metrics via OTLP only → sidecar → gateway → metric upstreams. No direct scrape of the controller.
What happens instead:
- User sets
controller.metricsConfig.enabled: false expecting OTLP-only metrics.
- Chart omits
metricsConfig from the ConfigMap; controller uses defaults and still runs the Prometheus server on 9090 (with TLS).
- A default PodMonitor (e.g. from an OpenTelemetry when PodMonitor is enabled) selects pods with
sidecar.opentelemetry.io/injected: Exists and scrapes port name metrics.
- The workflow controller pod has two containers with a port named
metrics:
- Controller: 9090 (Prometheus server, HTTPS by default)
- Sidecar: 8888 (collector metrics, HTTP)
- Prometheus discovers both targets and scrapes them with HTTP.
- Scraping the controller’s 9090 over HTTP fails with:
http: TLS handshake error from <prometheus-ip>: client sent an HTTP request to an HTTPS server
Result: Continuous TLS handshake errors in controller logs, and the user cannot cleanly achieve an OTLP-only metrics flow without workarounds (e.g. enabling metrics with secure: false just to silence errors, or custom PodMonitors).
Proposed Fix
When metricsConfig.enabled is false, the chart should still emit a metricsConfig block so the controller explicitly disables its Prometheus server:
{{- if ne (index .Values.controller "metricsConfig") nil }}
metricsConfig:
enabled: {{ .Values.controller.metricsConfig.enabled }}
{{- if .Values.controller.metricsConfig.enabled }}
path: {{ .Values.controller.metricsConfig.path }}
port: {{ .Values.controller.metricsConfig.port }}
secure: {{ .Values.controller.metricsConfig.secure }}
# ... other fields
{{- end }}
{{- end }}
Or, more simply, always include the block when metricsConfig is defined in values, and let enabled control behavior:
{{- with .Values.controller.metricsConfig }}
metricsConfig:
enabled: {{ .enabled }}
path: {{ .path }}
port: {{ .port }}
secure: {{ .secure }}
{{- /* ... other fields when enabled */ -}}
{{- end }}
The important change: output metricsConfig: enabled: false when the user sets enabled: false, so the controller does not fall back to defaults that keep the server on.
Workaround (Current)
To avoid TLS errors while keeping the default PodMonitor, we must set metricsConfig.enabled: true and metricsConfig.secure: false so the chart injects the block and the controller serves HTTP on 9090. This is a workaround, not the desired OTLP-only setup.
References
- Argo Workflows metrics docs – OTLP is the recommended approach
- Workflow controller ConfigMap template:
templates/controller/workflow-controller-config-map.yaml
- Controller defaults when ConfigMap lacks
metricsConfig: binary fallback (metrics on, secure on)
Related helm chart
argo-workflows
Helm chart version
0.47.3
To Reproduce
- Set
metricsConfig.enabled: false
- In the Argo Workflows Controller logs, there is information about
Starting Prometheus metrics exporter, which is a default binary behaviour when metricsConfig does not exist.
Expected behavior
metricsConfig.enabled == false should result in the Prometheus metrics exporter not starting at all.
Screenshots
No response
Additional context
No response
Describe the bug
Summary
When
controller.metricsConfig.enabledis set tofalse, the argo-workflows Helm chart does not add ametricsConfigblock to the workflow controller ConfigMap. The controller then falls back to its binary defaults, which include the Prometheus metrics server enabled (and often with TLS). This causes problems when using OTLP-only metrics with a default PodMonitor that scrapes pods with the sidecar label.Environment
sidecar.opentelemetry.io/injected: ExistsCurrent Chart Behavior
From
workflow-controller-config-map.yaml:When
metricsConfig.enabledisfalse:{{- if }}condition is false, so the entiremetricsConfigblock is omitted from the ConfigMap.metricsConfigkey.enabled: true(Prometheus metrics server on)secure: true(TLS on port 9090)port: 9090So setting
metricsConfig.enabled: falsein values does not result in the controller actually disabling its Prometheus server.Problem When Using OTLP + Default PodMonitor
Desired flow: Controller pushes metrics via OTLP only → sidecar → gateway → metric upstreams. No direct scrape of the controller.
What happens instead:
controller.metricsConfig.enabled: falseexpecting OTLP-only metrics.metricsConfigfrom the ConfigMap; controller uses defaults and still runs the Prometheus server on 9090 (with TLS).sidecar.opentelemetry.io/injected: Existsand scrapes port namemetrics.metrics:http: TLS handshake error from <prometheus-ip>: client sent an HTTP request to an HTTPS serverResult: Continuous TLS handshake errors in controller logs, and the user cannot cleanly achieve an OTLP-only metrics flow without workarounds (e.g. enabling metrics with
secure: falsejust to silence errors, or custom PodMonitors).Proposed Fix
When
metricsConfig.enabledisfalse, the chart should still emit ametricsConfigblock so the controller explicitly disables its Prometheus server:Or, more simply, always include the block when
metricsConfigis defined in values, and letenabledcontrol behavior:The important change: output
metricsConfig: enabled: falsewhen the user setsenabled: false, so the controller does not fall back to defaults that keep the server on.Workaround (Current)
To avoid TLS errors while keeping the default PodMonitor, we must set
metricsConfig.enabled: trueandmetricsConfig.secure: falseso the chart injects the block and the controller serves HTTP on 9090. This is a workaround, not the desired OTLP-only setup.References
templates/controller/workflow-controller-config-map.yamlmetricsConfig: binary fallback (metrics on, secure on)Related helm chart
argo-workflows
Helm chart version
0.47.3
To Reproduce
metricsConfig.enabled: falseStarting Prometheus metrics exporter, which is a default binary behaviour whenmetricsConfigdoes not exist.Expected behavior
metricsConfig.enabled == falseshould result in the Prometheus metrics exporter not starting at all.Screenshots
No response
Additional context
No response