# Fix Silent Failure in Composite Scenarios by Evaluating All Scenario Exit Codes by WHOIM1205 · Pull Request #121 · krkn-chaos/krkn-ai

WHOIM1205 · 2026-01-26T09:52:51Z

User description

Summary

This PR fixes a critical silent failure bug in Krkn where composite scenario runs only evaluated the exit status of the first scenario, ignoring failures in subsequent scenarios.

As a result, chaos runs involving multiple scenarios could be incorrectly marked as successful, even when later scenarios failed due to SLO violations or misconfigurations.

This PR ensures that all scenario exit statuses are evaluated and the worst (non-zero) exit status is returned and logged.

Problem Description

Krkn supports composite scenarios via krknctl graph run, where multiple chaos scenarios are executed sequentially and reported in telemetry as an array of results.

However, the return code extraction logic only inspected the first scenario:

exit_status = scenarios[0].get("exit_status", default_returncode)

## Impacted Test Scenarios

The following test cases demonstrate the impact of this fix and prevent regressions in composite scenario handling.

### 1. Composite Scenario With Partial Failure (Primary Case)

**Input Telemetry**
```json
{
  "telemetry": {
    "scenarios": [
      {"name": "pod_scenario", "exit_status": 0},
      {"name": "network_scenario", "exit_status": 2}
    ]
  }
}



___

### **PR Type**
Bug fix


___

### **Description**
- Fix silent failure in composite scenarios by evaluating all exit codes

- Implement worst-case exit status logic prioritizing misconfiguration errors

- Add detailed logging for failed scenarios in composite runs

- Prevent incorrect success marking when subsequent scenarios fail


___

### Diagram Walkthrough


```mermaid
flowchart LR
  A["Extract scenarios from telemetry"] --> B["Iterate all scenarios"]
  B --> C["Collect exit statuses"]
  C --> D["Determine worst status"]
  D --> E["Log failures if found"]
  E --> F["Return worst exit status"]

File Walkthrough

Relevant files

Bug fix

krkn_runner.py `Evaluate all scenario exit codes in composite runs` krkn_ai/chaos_engines/krkn_runner.py Changed exit status extraction from first scenario only to evaluating all scenarios Implemented worst-case exit status logic with prioritization: misconfiguration errors (!=0,!=2) > SLO failures (2) > success (0) Added collection and logging of failed scenarios with their names and exit codes Updated debug logging to reflect worst exit status from all scenarios instead of just first	+29/-5

Check all scenario exit statuses instead of only the first one. For composite scenarios, return the worst exit status to prevent silent failures when subsequent scenarios fail. Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>

qodo-code-review · 2026-01-26T09:53:16Z

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
⚪	Log injection risk Description: Untrusted telemetry fields (`scenario_name`/`exit_status`) are logged in `failed_scenarios` without sanitization, which could enable log-forging/injection (e.g., scenario names containing newlines or control characters) if an attacker can influence the run log/telemetry content. krkn_runner.py [545-562] Referred Code for scenario in scenarios: exit_status = scenario.get("exit_status", 0) scenario_name = scenario.get("name", "unknown") if exit_status != 0: failed_scenarios.append((scenario_name, exit_status)) # Prioritize misconfiguration errors over SLO failures if worst_exit_status == 0: worst_exit_status = exit_status elif exit_status != 2 and worst_exit_status == 2: # Misconfiguration error takes precedence over SLO failure worst_exit_status = exit_status if failed_scenarios: logger.warning( "Scenario failures detected in composite run: %s", failed_scenarios )
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
🔴	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Wrong default handling: When a scenario is missing `exit_status`, the new logic defaults it to `0` (success) instead of using `default_returncode`, which can incorrectly mark composite runs as successful. Referred Code worst_exit_status = 0 failed_scenarios = [] for scenario in scenarios: exit_status = scenario.get("exit_status", 0) scenario_name = scenario.get("name", "unknown") if exit_status != 0: failed_scenarios.append((scenario_name, exit_status)) # Prioritize misconfiguration errors over SLO failures if worst_exit_status == 0: worst_exit_status = exit_status elif exit_status != 2 and worst_exit_status == 2: # Misconfiguration error takes precedence over SLO failure worst_exit_status = exit_status if failed_scenarios: logger.warning( "Scenario failures detected in composite run: %s", failed_scenarios ) ... (clipped 5 lines) Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Failure details logged: The warning log outputs `failed_scenarios` (scenario names and exit codes), which may unintentionally include sensitive identifiers depending on scenario naming conventions and should be reviewed/redacted as needed. Referred Code if failed_scenarios: logger.warning( "Scenario failures detected in composite run: %s", failed_scenarios ) Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

WHOIM1205 · 2026-01-26T09:53:18Z

hey @rh-rahulshetty
This fixes a silent failure in composite scenarios where only the first scenario’s exit status was evaluated. The runner now inspects all scenario results and propagates the worst failure, ensuring partial chaos failures are no longer reported as success.

qodo-code-review · 2026-01-26T09:54:10Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Use default_returncode fallback Replace the hardcoded `0` with the `default_returncode` parameter in `scenario.get("exit_status", 0)` to handle missing exit codes correctly. krkn_ai/chaos_engines/krkn_runner.py [546] -exit_status = scenario.get("exit_status", 0) +exit_status = scenario.get("exit_status", default_returncode) Apply / Chat Suggestion importance[1-10]: 7 __ Why: This is a valid point; using the passed `default_returncode` instead of a hardcoded `0` makes the function more robust by correctly handling cases where a scenario's exit status is missing.	Medium
General	Simplify exit status priority logic Refactor the logic for determining the `worst_exit_status` to improve readability by separating the priority checks from the main loop. krkn_ai/chaos_engines/krkn_runner.py [542-556] worst_exit_status = 0 failed_scenarios = [] +# Priority: misconfiguration > SLO failure (2) > success (0) +has_misconfig = any(s.get("exit_status", 0) not in [0, 2] for s in scenarios) +has_slo_failure = any(s.get("exit_status") == 2 for s in scenarios) + +if has_misconfig: + worst_exit_status = next((s.get("exit_status") for s in scenarios if s.get("exit_status", 0) not in [0, 2]), 0) +elif has_slo_failure: + worst_exit_status = 2 + for scenario in scenarios: exit_status = scenario.get("exit_status", 0) - scenario_name = scenario.get("name", "unknown") + if exit_status != 0: + scenario_name = scenario.get("name", "unknown") + failed_scenarios.append((scenario_name, exit_status)) - if exit_status != 0: - failed_scenarios.append((scenario_name, exit_status)) - # Prioritize misconfiguration errors over SLO failures - if worst_exit_status == 0: - worst_exit_status = exit_status - elif exit_status != 2 and worst_exit_status == 2: - # Misconfiguration error takes precedence over SLO failure - worst_exit_status = exit_status - Apply / Chat Suggestion importance[1-10]: 2 __ Why: The suggested code is less efficient as it iterates over the scenarios multiple times, whereas the original code uses a single loop, and it is not clearly more readable.	Low
More

Fix composite scenario exit status extraction

59c88d2

Check all scenario exit statuses instead of only the first one. For composite scenarios, return the worst exit status to prevent silent failures when subsequent scenarios fail. Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>

WHOIM1205 requested a review from rh-rahulshetty as a code owner January 26, 2026 09:52

qodo-code-review bot added the Review effort 2/5 label Jan 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

# Fix Silent Failure in Composite Scenarios by Evaluating All Scenario Exit Codes#121

# Fix Silent Failure in Composite Scenarios by Evaluating All Scenario Exit Codes#121
WHOIM1205 wants to merge 1 commit intokrkn-chaos:mainfrom
WHOIM1205:fix/composite-scenario-exit-status

WHOIM1205 commented Jan 26, 2026 •

edited

Loading

Uh oh!

qodo-code-review bot commented Jan 26, 2026

Uh oh!

WHOIM1205 commented Jan 26, 2026

Uh oh!

qodo-code-review bot commented Jan 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

WHOIM1205 commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User description

Summary

Problem Description

File Walkthrough

Uh oh!

qodo-code-review bot commented Jan 26, 2026

PR Compliance Guide 🔍

Uh oh!

WHOIM1205 commented Jan 26, 2026

Uh oh!

qodo-code-review bot commented Jan 26, 2026

PR Code Suggestions ✨

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

WHOIM1205 commented Jan 26, 2026 •

edited

Loading