feat: Implement ML-based recommendation system for chaos scenarios#145
feat: Implement ML-based recommendation system for chaos scenarios#145DhruvTotala wants to merge 1 commit intokrkn-chaos:mainfrom
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
|||||||||||||||||||||||||
PR Code Suggestions ✨Explore these optional code suggestions:
|
||||||||||||||||||||
Signed-off-by: dhruv <dhruvtotla30@gmail.com>
5283366 to
b58120e
Compare
There was a problem hiding this comment.
Training the model on randomly generated (rule-based) data feels somewhat meaningless from an ML perspective.
If the training data is synthetic and deterministic, this effectively behaves like a hard-coded rule, rather than a model learning from real behavior.
Additionally, the current feature set includes only three telemetry signals, which makes the model largely blind.
Given the scope of this project, it would be more appropriate to train the recommender using real time cluster telemetry, ideally in a time-series context from a live or representative environment.
You should also refer to this comment by @rh-rahulshetty in a previously open PR.
User description
This pull request adds the first version of a machine learning–based recommendation system to Krkn AI.
The goal is to help users decide which chaos scenario to run by looking at the current state of the cluster. Based on telemetry data such as CPU usage, memory usage, and network behavior, the system recommends the most relevant chaos experiment (for example, whether a CPU hog or memory hog scenario would have more impact).
Since there is no historical labeled data available yet, this PR also includes a synthetic data generator. This allows us to create realistic fake telemetry data and train an initial model so the recommendation system can work from day one.
What’s Included in This PR :-
Recommendation Engine
Introduces a new ScenarioRecommender class.
Collects aggregated cluster metrics from Prometheus.
Uses a Random Forest machine learning model to predict which chaos scenario is most suitable for the current system conditions.
Command-Line Support
Adds a new CLI command:
Users can optionally provide a Prometheus URL.
The trained model is loaded from a file (default: krkn_model.pkl).
Training Script
Adds a utility script to:
Generate synthetic telemetry data.
Train an initial machine learning model using that data.
This helps bootstrap the system until real-world data becomes available.
PR Type
Enhancement
Description
Adds ML-based recommendation system for chaos scenarios
Implements ScenarioRecommender class with telemetry collection
Includes training script with synthetic data generator
Adds CLI command to recommend scenarios based on cluster metrics
Diagram Walkthrough
File Walkthrough
cmd.py
Add recommend CLI command with ML integrationkrkn_ai/cli/cmd.py
recommendCLI command for scenario recommendationsissues
__init__.py
Initialize recommendation module packagekrkn_ai/recommendation/init.py
recommender.py
Implement core recommendation engine logickrkn_ai/recommendation/recommender.py
train_model.py
Add model training script with synthetic datascripts/train_model.py
requirements.txt
Add scikit-learn dependencyrequirements.txt