Skip to content

Ghost Silence: Some silences have been manually expired, but they will be recreated strangely #3457

Open
@east4ming

Description

@east4ming

What did you do?

Initially, I create a long time silence (for forever silence), like this:

Matchers:
  environment=~.*uat.*|.*_preprod$
  alertsource!=monitor-zabbix
  cluster!=cloud-es
Duration: 999w

or

Matchers:
  environment="aws_sg_xxx_sdb"
Duration: 999w

A few months later, I deleted(expire) the silence manually.

But a few months later, the silence like a Ghost: it is recreated strangely. (I sware I didn't manually Recreate it!). And it's been repeated several times over the past year.

This caused some alerts (such as uat labels) not to be sent, and it took us a long time to find out that the problem was silence.

What did you expect to see?

Fix the problem.

What did you see instead? Under which circumstances?

See above.

Environment

  • System information:

    Linux 3.10.0-1160.59.1.el7.x86_64 x86_64

  • Alertmanager version:

alertmanager, version 0.24.0 (branch: HEAD, revision: [`f484b17`](https://github.com/prometheus/alertmanager/commit/f484b17fa3c583ed1b2c8bbcec20ba1db2aa5f11)) (Recently upgraded to v0.25.0 because of another issue)
build user: root@265f14f5c6fc
build date: 20220325-09:31:33
go version: go1.17.8
platform: linux/amd64
  • Prometheus version:

    v2.39.1

  • Alertmanager configuration file:

  • Alertmanager CLI: (installed with Helm Chart)
alertmanager --storage.path=/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --cluster.advertise-address=[10.244.14.187]:9094 --cluster.listen-address=0.0.0.0:9094 --cluster.peer=monitor-alertmanager-0.monitor-alertmanager-headless:9094 --cluster.peer=monitor-alertmanager-1.monitor-alertmanager-headless:9094 --cluster.peer=monitor-alertmanager-2.monitor-alertmanager-headless:9094 --data.retention=169h --web.route-prefix=/ --log.level=info
  • Prometheus configuration file:
insert configuration here (if relevant to the issue)

Finally, if there is any information I can provide, please let me know.
In addition, the problem recurs over a long period of time and may not enable debug logging and continuous collection.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions