If your Alertmanager configuration does not work properly, you can compare the alertmanager-main
secret with the running Alertmanager configuration to identify possible errors. You can also test your alert routing configuration by creating a test alert.
-
You have access to the cluster as a user with the
cluster-admin
cluster role. -
You have installed the {oc-first}.
-
Compare the
alertmanager-main
secret with the running Alertmanager configuration:-
Extract the Alertmanager configuration from the
alertmanager-main
secret into thealertmanager.yaml
file:$ oc -n openshift-monitoring get secret alertmanager-main --template='{{ index .data "alertmanager.yaml" }}' | base64 --decode > alertmanager.yaml
-
Pull the running Alertmanager configuration from the API:
$ oc exec -n openshift-monitoring alertmanager-main-0 -- amtool config show --alertmanager.url http://localhost:9093
Example outputglobal: resolve_timeout: 5m http_config: follow_redirects: true enable_http2: true proxy_from_environment: true ... route: receiver: default group_by: - namespace continue: false routes: ... - matchers: # (1) - service="example-app" continue: false routes: - receiver: team-frontend-page matchers: - severity="critical" continue: false ... receivers: ... - name: team-frontend-page # (2) pagerduty_configs: - send_resolved: true http_config: authorization: type: Bearer credentials: <secret> follow_redirects: true enable_http2: true proxy_from_environment: true service_key: <secret> url: https://events.pagerduty.com/v2/enqueue ... templates: []
-
The example shows the route to the
team-frontend-page
receiver. Alertmanager routes alerts withservice="example-app"
andseverity="critical"
labels to this receiver. -
The
team-frontend-page
receiver configuration. The example shows PagerDuty as a receiver.
-
-
Compare the contents of the
route
andreceiver
fields of thealertmanager.yaml
file with the fields in the running Alertmanager configuration. Look for any discrepancies. -
If you used an
AlertmanagerConfig
object to configure alert routing for user-defined projects, you can use thealertmanager.yaml
file to see the configuration before theAlertmanagerConfig
object was applied. The running Alertmanager configuration shows the changes after the object was applied:Example running configuration with AlertmanagerConfig applied... route: ... routes: - receiver: ns1/example-routing/UWM-receiver (1) group_by: - job matchers: - namespace="ns1" continue: true ... receivers: ... - name: ns1/example-routing/UWM-receiver (1) webhook_configs: - send_resolved: true http_config: follow_redirects: true enable_http2: true proxy_from_environment: true url: <secret> url_file: "" max_alerts: 0 timeout: 0s templates: []
-
The routing configuration from the
example-routing
AlertmanagerConfig
object in thens1
project for theUWM-receiver
receiver.
-
-
-
Check Alertmanager pod logs to see if there are any errors:
$ oc -n openshift-monitoring logs -c alertmanager <alertmanager_pod>
NoteFor multi-node clusters, ensure that you check all Alertmanager pods and their logs.
Example command$ oc -n openshift-monitoring logs -c alertmanager alertmanager-main-0
-
Verify that your receiver is configured correctly by creating a test alert.
-
Get a list of the configured routes:
$ oc exec alertmanager-main-0 -n openshift-monitoring -- amtool config routes show --alertmanager.url http://localhost:9093
Example outputRouting tree: . └── default-route receiver: default ├── {alertname="Watchdog"} receiver: Watchdog └── {service="example-app"} receiver: default └── {severity="critical"} receiver: team-frontend-page
-
Print the route to your chosen receiver. The following example shows the receiver used for alerts with
service=example-app
and severity=critical` matchers.$ oc exec alertmanager-main-0 -n openshift-monitoring -- amtool config routes test service=example-app severity=critical --alertmanager.url http://localhost:9093
Example outputteam-frontend-page
-
Create a test alert and add it to the Alertmanager. The following example creates an alert with
service=example-app
andseverity=critical
to test theteam-frontend-page
receiver:$ oc exec alertmanager-main-0 -n openshift-monitoring -- amtool alert add --alertmanager.url http://localhost:9093 alertname=myalarm --start="2025-03-31T00:00:00-00:00" service=example-app severity=critical --annotation="summary=\"This is a test alert with a custom summary\""
-
Verify that the alert was generated:
$ oc exec alertmanager-main-0 -n openshift-monitoring -- amtool alert --alertmanager.url http://localhost:9093
Example outputAlertname Starts At Summary State myalarm 2025-03-31 00:00:00 UTC This is a test alert with a custom summary active Watchdog 2025-04-07 10:07:16 UTC An alert that should always be firing to certify that Alertmanager is working properly. active
-
Verify that the receiver was notified with the
myalarm
alert.
-