You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix: revert failure metric, NaN/Inf guard, and validator mutation (#277)
* fix: revert failure metric, NaN/Inf guard, and validator mutation
- Add attune_revert_failures_total metric so failed revert attempts
are visible to monitoring and alerting. Previously, a failed revert
(the operator's worst failure mode) was invisible to Prometheus.
- Filter NaN/Inf samples in QueryRangeGrouped to prevent non-finite
values from Prometheus flowing into the recommendation engine.
- Stop validator from mutating the input AttunePolicy when
UpdateStrategy is nil; use a local variable instead.
- Remove redundant header clone in headerTransport.RoundTrip
(req.Clone already deep-copies headers).
- Add FROM/TO resource values to the revert log for easier debugging.
- Add Grafana dashboard panel, PrometheusRule alert, docs, and
troubleshooting section for the new metric.
Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
* fix: add revert failures panel to source Grafana dashboard
Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
---------
Signed-off-by: Sebastien Tardif <sebtardif@ncf.ca>
| maxConcurrentReconciles | string |`""`| Maximum number of AttunePolicy reconciles running in parallel. Increase for large clusters with many policies (e.g. 4 for 200+ policies). |
0 commit comments