SUMMARY
Be able to place Grafana-managed silences over alert rules and manage their lifecycle ignoring its duration.
ISSUE TYPE
- Feature Idea
I would like to be able to define persistent or long-running Grafana-managed silences as code using either YAML or other sort of data structure so I can let my Ansible playbook fully manage silences against Grafana. Ideally, ignoring expired (non-active) silences and their startAt and endAt attributes for comparison purposes.
Why is this relevant?
In an environment where the number of alert rules is reduced leveraging dynamic alert rules (Grafana 9.3+), and labels drive infrastructure components off your datasource (hostname, interface, status, etc... ) provisioning silences that are defined in codebase or in any other data structure is quite important to increase your signal to noise relation.
COMPONENT NAME
Re-purposing the existing grafana_silence module may work, but I'm open to add a separate model that takes care of this.
ADDITIONAL INFORMATION
I've solved this by quickly forking and modifying your existing module.
I think we could also add support for querying all active silences, and filter out based on some criteria. This would abstract some complexity of having to query the Grafana API beforehand in the Ansible role/playbook, and compare the results to create a valid variable to iterate upon.
- name: Retrieve all silences placed by persistent-silences
uri:
url: "{{ grafana_url }}/api/alertmanager/grafana/api/v2/silences"
method: GET
headers:
Cookie: "{{ grafana_login.cookies_string }}"
return_content: true
check_mode: false
register: grafana_provisioning_existing_silences
- name: Load existing silences created by persistent-silences
set_fact:
filtered_existing_silences: >-
{{ grafana_provisioning_existing_silences.json
| selectattr('createdBy', 'equalto', 'persistent-silences')
| selectattr('status.state', 'equalto', 'active')
| list }}
- name: Set silences var (This is an example, we got them defined as YAML)
set_fact:
silences:
- comment: "My Persistent Silence sample"
state: present
startsAt: "2000-01-01T00:00:00Z"
endsAt: "2100-01-01T00:00:00Z"
matchers:
- isEqual: true
isRegex: true
name: hostname
value: "node-1.*"
- isEqual: true
isRegex: true
name: interface
value: "^eth(1|2)$"
- isEqual: true
isRegex: false
name: __alert_rule_uid__
value: abcdefg123
- comment: "My Persistent sample 2"
state: present
startsAt: "2000-01-01T00:00:00Z"
endsAt: "2100-01-01T00:00:00Z"
matchers:
- isEqual: true
isRegex: true
name: hostname
value: "node-2.*"
- isEqual: true
isRegex: false
name: env
value: "eu-west-1"
- isEqual: true
isRegex: false
name: __alert_rule_uid__
value: abcdefg123
- name: Set silences to remove
set_fact:
silences: >-
{{
(silences | list) +
(filtered_existing_silences | rejectattr('comment', 'in', silences | map(attribute='comment') | list) | map('combine', {'state': 'absent'}) | list)
}}
- name: Create or delete persistent silences
community.grafana.grafana_silence:
created_by: "persistent-silences"
comment: "{{ silence.comment }}"
grafana_url: "{{ grafana_url }}"
starts_at: "{{ silence.startsAt }}"
ends_at: "{{ silence.endsAt }}"
matchers: "{{ silence.matchers }}"
state: "{{ silence.state | default('present') }}"
grafana_api_key: "{{ grafana_api_key }}"
loop: "{{ silences }}"
loop_control:
loop_var: silence
Below are the changes I made to make the grafana_silence work. It'd need a bunch of other bits and pieces to make sure tests passed, etc...
main...kitos9112:community.grafana:main
SUMMARY
Be able to place Grafana-managed silences over alert rules and manage their lifecycle ignoring its duration.
ISSUE TYPE
I would like to be able to define persistent or long-running Grafana-managed silences as code using either YAML or other sort of data structure so I can let my Ansible playbook fully manage silences against Grafana. Ideally, ignoring expired (non-active) silences and their
startAtandendAtattributes for comparison purposes.Why is this relevant?
In an environment where the number of alert rules is reduced leveraging dynamic alert rules (Grafana 9.3+), and labels drive infrastructure components off your datasource (
hostname,interface,status, etc... ) provisioning silences that are defined in codebase or in any other data structure is quite important to increase your signal to noise relation.COMPONENT NAME
Re-purposing the existing grafana_silence module may work, but I'm open to add a separate model that takes care of this.
ADDITIONAL INFORMATION
I've solved this by quickly forking and modifying your existing module.
I think we could also add support for querying all active silences, and filter out based on some criteria. This would abstract some complexity of having to query the Grafana API beforehand in the Ansible role/playbook, and compare the results to create a valid variable to iterate upon.
Below are the changes I made to make the grafana_silence work. It'd need a bunch of other bits and pieces to make sure tests passed, etc...
main...kitos9112:community.grafana:main