Description
What did you do?
Initially, I create a long time silence (for forever silence), like this:
Matchers:
environment=~.*uat.*|.*_preprod$
alertsource!=monitor-zabbix
cluster!=cloud-es
Duration: 999w
or
Matchers:
environment="aws_sg_xxx_sdb"
Duration: 999w
A few months later, I deleted(expire) the silence manually.
But a few months later, the silence like a Ghost: it is recreated strangely. (I sware I didn't manually Recreate it!). And it's been repeated several times over the past year.
This caused some alerts (such as uat
labels) not to be sent, and it took us a long time to find out that the problem was silence.
What did you expect to see?
Fix the problem.
What did you see instead? Under which circumstances?
See above.
Environment
-
System information:
Linux 3.10.0-1160.59.1.el7.x86_64 x86_64
-
Alertmanager version:
alertmanager, version 0.24.0 (branch: HEAD, revision: [`f484b17`](https://github.com/prometheus/alertmanager/commit/f484b17fa3c583ed1b2c8bbcec20ba1db2aa5f11)) (Recently upgraded to v0.25.0 because of another issue)
build user: root@265f14f5c6fc
build date: 20220325-09:31:33
go version: go1.17.8
platform: linux/amd64
-
Prometheus version:
v2.39.1
-
Alertmanager configuration file:
- Alertmanager CLI: (installed with Helm Chart)
alertmanager --storage.path=/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --cluster.advertise-address=[10.244.14.187]:9094 --cluster.listen-address=0.0.0.0:9094 --cluster.peer=monitor-alertmanager-0.monitor-alertmanager-headless:9094 --cluster.peer=monitor-alertmanager-1.monitor-alertmanager-headless:9094 --cluster.peer=monitor-alertmanager-2.monitor-alertmanager-headless:9094 --data.retention=169h --web.route-prefix=/ --log.level=info
- Prometheus configuration file:
insert configuration here (if relevant to the issue)
Finally, if there is any information I can provide, please let me know.
In addition, the problem recurs over a long period of time and may not enable debug logging and continuous collection.
Thanks!