-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Currently:
"Leveraging Cerberus to monitor the cluster under test and consuming the aggregated go/no-go signal to determine pass/fail post chaos. It is highly recommended to turn on the Cerberus health check feature available in Kraken. Instructions on installing and setting up Cerberus can be found here or can be installed from Kraken using the instructions. Once Cerberus is up and running, set cerberus_enabled to True and cerberus_url to the url where Cerberus publishes go/no-go signal in the Kraken config file. Cerberus can monitor application routes during the chaos and fails the run if it encounters downtime as it is a potential downtime in a customers, or users environment as well. It is especially important during the control plane chaos scenarios including the API server, Etcd, Ingress etc. It can be enabled by setting check_applicaton_routes: True in the Kraken config provided application routes are being monitored in the cerberus config.”
Instead:
Leveraging Cerberus to monitor the cluster under test and consuming the aggregated go/no-go signal to determine pass/fail post chaos.
- It is highly recommended to turn on the Cerberus health check feature available in Kraken. Instructions on installing and setting up Cerberus can be found here or can be installed from Kraken using the instructions.
- Once Cerberus is up and running, set cerberus_enabled to True and cerberus_url to the url where Cerberus publishes go/no-go signal in the Kraken config file.
- Cerberus can monitor application routes during the chaos and fails the run if it encounters downtime as it is a potential downtime in a customer’s or user’s environment.
- It is especially important during the control plane chaos scenarios including the API server, Etcd, Ingress etc.
- It can be enabled by setting check_applicaton_routes: True in the Kraken config provided application routes are being monitored in the cerberus config. - Leveraging built-in alert collection feature to fail the runs in case of critical alerts.
- See also: SLOs validation for more details on metrics and alerts
Fail test if certain metrics aren’t met at the end of the run