-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Pod scenarios in Krkn involve injecting failures directly into individual Kubernetes pods to test how the applications and the cluster itself handles the disruptions. This helps verify if the applications can recover and if the cluster remains stable when pods are unexpectedly terminated.
Configuration structure:
To define a Krkn chaos test that involves pod disruptions, you reference a scenario file within your main Krkn configuration (often a kraken.yaml or similar).
kraken:
chaos_scenarios:
- pod_disruption_scenarios:
- path/to/scenario.yaml
chaos_scenarios: This is a list where you specify the different types of chaos tests Krkn should run.
- pod_disruption_scenarios: This indicates you're configuring a set of chaos scenarios that target pods.
- path/to/scenario.yaml: This is where you point to the actual YAML file that defines the specific pod chaos you want to inject. You can list multiple such files.
The scenario.yaml file defines the specifics of a single pod chaos test
yaml-language-server: $schema=../plugin.schema.json
- id: kill-pods
config:
namespace_pattern: ^kube-system$
label_selector: k8s-app=kube-scheduler
krkn_pod_recovery_time: 120
id: kill-pods: A unique identifier for this specific scenario.
config: This section holds the parameters for the pod disruption.
- namespace_pattern: ^kube-system$: This is a regular expression that tells Krkn which namespaces to target. In this example it will only affect pods in the kube-system namespace.
- label_selector: k8s-app=kube-scheduler: This selector further refines which pods within the matched namespaces will be affected. Here, it targets pods with the label k8s-app set to kube-scheduler. So, it would target kube-scheduler pods in the kube-system namespace.
- krkn_pod_recovery_time: 120: This parameter defines the success criteria for the test. It specifies the maximum time (in seconds) Krkn will wait for the affected pods to recover after the chaos has been injected. If pods don't recover within this time, the test might be considered a failure.