Caution
The tool is currently in under active development, use it at your own risk.
An intelligent chaos engineering framework that uses genetic algorithms to optimize chaos scenarios for Kubernetes/OpenShift applications. Krkn-AI automatically evolves and discovers the most effective chaos experiments to test your system's resilience.
- Genetic Algorithm Optimization: Automatically evolves chaos scenarios to find optimal testing strategies
- Multi-Scenario Support: Pod failures, container scenarios, node resource exhaustion, and application outages
- Kubernetes/OpenShift Integration: Native support for both platforms
- Health Monitoring: Continuous monitoring of application health during chaos experiments
- Prometheus Integration: Metrics-driven fitness evaluation
- Configurable Fitness Functions: Point-based and range-based fitness evaluation
- Population Evolution: Maintains and evolves populations of chaos scenarios across generations
- krknctl
- Python 3.9+
uvpackage manager (recommended) orpip- podman
- helm
- Kubernetes cluster access file (kubeconfig)
# Install uv if you haven't already
pip install uv
# Create and activate virtual environment
uv venv --python 3.9
source .venv/bin/activate
# Install Krkn-AI in development mode
uv pip install -e .
# Check Installation
uv run krkn_ai --helpFor demonstration purposes, deploy the robot-shop microservice:
export DEMO_NAMESPACE=robot-shop
export IS_OPENSHIFT=true
#set IS_OPENSHIFT=false for kubernetes cluster
./scripts/setup-demo-microservice.sh
# Set context to the demo namespace
oc config set-context --current --namespace=$DEMO_NAMESPACE
# or for kubectl:
# kubectl config set-context --current --namespace=$DEMO_NAMESPACE# Setup NGINX reverse proxy for external access
./scripts/setup-nginx.sh
# Test application endpoints
./scripts/test-nginx-routes.sh
export HOST="http://$(kubectl get service rs -o json | jq -r '.status.loadBalancer.ingress[0].hostname')"Krkn-AI uses YAML configuration files to define experiments. You can generate a sample config file dynamically by running Krkn-AI discover command.
uv run krkn_ai discover -k ./tmp/kubeconfig.yaml \
-n "robot-shop" \
-pl "service" \
-nl "kubernetes.io/hostname" \
-o ./tmp/krkn-ai.yaml \
--skip-pod-name "nginx-proxy.*"The -n (namespace), -pl (pod-label), -nl (node-label), and --skip-pod-name options support flexible pattern matching:
| Pattern | Description |
|---|---|
robot-shop |
Match exactly "robot-shop" |
robot-shop,default |
Match "robot-shop" OR "default" |
openshift-.* |
Regex: match namespaces starting with "openshift-" |
* |
Match all |
!kube-system |
Match all EXCEPT "kube-system" |
*,!kube-.* |
Match all except kube-* namespaces |
openshift-.*,!openshift-operators |
Match openshift-* but exclude operators |
Examples:
# Discover in all namespaces except kube-system and openshift-*
uv run krkn_ai discover -k ./tmp/kubeconfig.yaml \
-n "!kube-system,!openshift-.*" \
-o ./tmp/krkn-ai.yaml
# Discover in openshift namespaces but exclude operators
uv run krkn_ai discover -k ./tmp/kubeconfig.yaml \
-n "openshift-.*,!openshift-operators" \
-o ./tmp/krkn-ai.yaml# Path to your kubeconfig file
kubeconfig_file_path: "./tmp/kubeconfig.yaml"
# Optional: Random seed for reproducible runs
# seed: 42
# Genetic algorithm parameters
generations: 5
population_size: 10
composition_rate: 0.3
population_injection_rate: 0.1
# Uncomment the line below to enable runs by duration instead of generation count
# duration: 600
# Duration to wait before running next scenario (seconds)
wait_duration: 30
# Elasticsearch configuration for storing run results (Optional)
elastic:
enable: false # Set to true to enable Elasticsearch integration
verify_certs: true # Verify SSL certificates
server: "http://localhost" # Elasticsearch URL
port: 9200 # Elasticsearch port
username: "$ES_USER" # Elasticsearch username
password: "$__ES_PASSWORD" # Elasticsearch password (start param with __ to treat as private)
index: "krkn-ai" # Index prefix for storing Krkn-AI config and results
# Specify how result filenames are formatted
output:
result_name_fmt: "scenario_%s.yaml"
graph_name_fmt: "scenario_%s.png"
log_name_fmt: "scenario_%s.log"
# Fitness function configuration
fitness_function:
query: 'sum(kube_pod_container_status_restarts_total{namespace="robot-shop"})'
type: point # or 'range'
include_krkn_failure: true
# Health endpoints to monitor
health_checks:
stop_watcher_on_failure: false
applications:
- name: cart
url: "$HOST/cart/add/1/Watson/1"
- name: catalogue
url: "$HOST/catalogue/categories"
# Chaos scenarios to evolve
scenario:
pod-scenarios:
enable: true
application-outages:
enable: false
container-scenarios:
enable: false
node-cpu-hog:
enable: false
node-memory-hog:
enable: false
kubevirt-outage:
enable: false
# Cluster components to consider for Krkn-AI testing
cluster_components:
namespaces:
- name: robot-shop
pods:
- containers:
- name: cart
labels:
service: cart
name: cart-7cd6c77dbf-j4gsv
- containers:
- name: catalogue
labels:
service: catalogue
name: catalogue-94df6b9b-pjgsr
nodes:
- labels:
kubernetes.io/hostname: node-1
name: node-1
- labels:
kubernetes.io/hostname: node-2
name: node-2You can modify krkn-ai.yaml as per your requirement to include/exclude any cluster components, scenarios, fitness function SLOs or health check endpoints for the Krkn-AI testing.
# Configure custom Prometheus Querier endpoint and token
export PROMETHEUS_URL='https://your-prometheus-url'
export PROMETHEUS_TOKEN='your-prometheus-token'
# Configure elastic search properties (optional)
export ES_USER="elasticsearch-username"
export __ES_PASSWORD="elasticsearch-password"
# Run Krkn-AI
uv run krkn_ai run \
-c ./tmp/krkn-ai.yaml \
-o ./tmp/results/ \
-p HOST=$HOST \
-p ES_USER=$ES_USER -p __ES_PASSWORD=$__ES_PASSWORD$ uv run krkn_ai discover --help
Usage: krkn_ai discover [OPTIONS]
Discover components for Krkn-AI tests
Options:
-k, --kubeconfig TEXT Path to cluster kubeconfig file.
-o, --output TEXT Path to save config file.
-n, --namespace TEXT Namespace(s) to discover components in. Supports
Regex and comma separated values.
-pl, --pod-label TEXT Pod Label Keys(s) to filter. Supports Regex and
comma separated values.
-nl, --node-label TEXT Node Label Keys(s) to filter. Supports Regex and
comma separated values.
-v, --verbose Increase verbosity of output.
--skip-pod-name TEXT Pod name to skip. Supports comma separated values
with regex.
--help Show this message and exit.
$ uv run krkn_ai run --help
Usage: krkn_ai run [OPTIONS]
Run Krkn-AI tests
Options:
-c, --config TEXT Path to Krkn-AI config file.
-o, --output TEXT Directory to save results.
-f, --format [json|yaml] Format of the output file.
-r, --runner-type [krknctl|krknhub]
Type of krkn engine to use.
-p, --param TEXT Additional parameters for config file in
key=value format.
-v, --verbose Increase verbosity of output.
--help Show this message and exit.Note: You can also run Krkn-AI as a container with Podman or on Kubernetes. See container instructions.
Krkn-AI saves results in the specified output directory:
.
βββ results/
βββ reports/
β βββ health_check_report.csv
β βββ graphs/
β βββ best_generation.png
β βββ scenario_1.png
β βββ scenario_2.png
β βββ ...
βββ yaml/
β βββ generation_0/
β β βββ scenario_1.yaml
β β βββ scenario_2.yaml
β β βββ ...
β βββ generation_1/
β βββ ...
βββ log/
β βββ scenario_1.log
β βββ scenario_2.log
β βββ ...
βββ best_scenarios.json
βββ config.yaml
The current version of Krkn-AI leverages an evolutionary algorithm, an optimization technique that uses heuristics to identify chaos scenarios and components that impact the stability of your cluster and applications.
- Initial Population: Creates random chaos scenarios based on your configuration
- Fitness Evaluation: Runs each scenario and measures system response using Prometheus metrics
- Selection: Identifies the most effective scenarios based on fitness scores
- Evolution: Creates new scenarios through crossover and mutation
- Health Monitoring: Continuously monitors application health during experiments
- Iteration: Repeats the process across multiple generations to find optimal scenarios
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes and run the static checks (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Developers should run the project's static checks locally before committing. Below are recommended commands and notes for common environments (PowerShell / Bash).
- Install tooling used for checks:
# Activate Virtual Environment
source .venv/bin/activate
# Install dev requirement
uv pip install -r requirements-dev.txt- Install Git hooks (runs once per developer):
pre-commit install
pre-commit autoupdate- Run all pre-commit hooks against the repository (fast, recommended):
pre-commit run --all-files- Run individual tools directly:
# Ruff (linter/formatter)
ruff check .
ruff format .
# Mypy (type checking)
mypy --config-file mypy.ini krkn_ai
# Hadolint (Dockerfile/Containerfile linting) - Docker must be available
hadolint containers/ContainerfileNotes:
- The
pre-commitconfiguration runsruff, various file checks, andhadolintfor container files. Ifhadolintfails with a Docker error, ensure Docker Desktop/daemon is running on your machine (the hook needs to query Docker to validate containerfile context). - Use
pre-commit run --all-filesto validate changes before pushing. CI will also run these checks.