Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvments and fixes to our SLO rules #2519

Open
4 tasks
majamassarini opened this issue Sep 6, 2024 · 0 comments
Open
4 tasks

Improvments and fixes to our SLO rules #2519

majamassarini opened this issue Sep 6, 2024 · 0 comments
Labels
complexity/epic Lost of work ahead, planning/design required.

Comments

@majamassarini
Copy link
Member

majamassarini commented Sep 6, 2024

  1. Fix slowness in giving feedback: ideally there should be always 0 events/minute here this could be related with issue
  2. After point 1. is solved we can also fix SLO1 rule, if in the above graph there are more than 1 event/min of no_status_after_25_s than an alert should be fired (now it isn't - unless we have a burst of events - the considered time range is too broad and we have too few events) issue
  3. Fix copr_builds_queued_total metric, it totally out of range it is 10000 time greater than copr_builds_started_total and copr_builds_finished_total which seems to be ok. A card exist for this issue
  4. After point 3. is solved, add new Prometheus alerts for Testing Farm and Copr jobs stuck on the external service level, you can use the definition of the following graphs:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity/epic Lost of work ahead, planning/design required.
Projects
Status: new
Development

No branches or pull requests

1 participant