Skip to content

refactor: consolidate kubernetes_functions (#1120)#1121

Open
AR21SM wants to merge 1 commit intokrkn-chaos:mainfrom
AR21SM:refactor/consolidate-kubernetes-functions
Open

refactor: consolidate kubernetes_functions (#1120)#1121
AR21SM wants to merge 1 commit intokrkn-chaos:mainfrom
AR21SM:refactor/consolidate-kubernetes-functions

Conversation

@AR21SM
Copy link
Contributor

@AR21SM AR21SM commented Jan 21, 2026

Type of change

  • Refactor
  • New feature
  • Bug fix
  • Optimization

Description

Consolidate duplicate kubernetes_functions.py files by creating a shared module at krkn/scenario_plugins/native/common/kubernetes_utils.py.

Changes:

  • Created native/common/kubernetes_utils.py with 11 shared functions
  • Consolidated list_pods using superset signature (cli, namespace, label_selector=None, exclude_label=None)
  • Refactored network/kubernetes_functions.py (285 → 62 lines)
  • Refactored pod_network_outage/kubernetes_functions.py (275 → 35 lines)

Preserved unique functionality:

  • setup_kubernetes() - kept separate (different return types per scenario)
  • create_ifb() / delete_ifb() - kept in network module only

Related Tickets & Documents

Documentation

  • Is documentation needed for this update?

No documentation changes needed - internal refactor with no API changes.


PR Type

Enhancement


Description

  • Consolidate duplicate Kubernetes utility functions into shared module

  • Reduce code duplication across network and pod_network_outage plugins

  • Network module reduced from 285 to 62 lines via imports

  • Pod_network_outage module reduced from 275 to 35 lines via imports


Diagram Walkthrough

flowchart LR
  A["network/<br/>kubernetes_functions.py<br/>285 lines"] -->|extract common| B["common/<br/>kubernetes_utils.py<br/>271 lines"]
  C["pod_network_outage/<br/>kubernetes_functions.py<br/>275 lines"] -->|extract common| B
  B -->|import| A2["network/<br/>kubernetes_functions.py<br/>62 lines"]
  B -->|import| C2["pod_network_outage/<br/>kubernetes_functions.py<br/>35 lines"]
Loading

File Walkthrough

Relevant files
Enhancement
__init__.py
Create common package for native plugins                                 

krkn/scenario_plugins/native/common/init.py

  • Created new common package for native scenario plugins
  • Added module-level documentation comment
+1/-0     
kubernetes_utils.py
Create shared Kubernetes utility functions module               

krkn/scenario_plugins/native/common/kubernetes_utils.py

  • Created new shared module with 11 common Kubernetes utility functions
  • Functions include: create_job, delete_pod, create_pod,
    exec_cmd_in_pod, list_pods, get_job_status, get_pod_log, read_pod,
    delete_job, list_ready_nodes, get_node
  • Enhanced list_pods with exclude_label parameter support
  • Comprehensive error handling and logging for all operations
+271/-0 
kubernetes_functions.py
Refactor network module to use shared utilities                   

krkn/scenario_plugins/native/network/kubernetes_functions.py

  • Removed 11 duplicate functions now imported from kubernetes_utils
  • Added imports from common.kubernetes_utils module
  • Kept network-specific functions: create_ifb and delete_ifb
  • Kept scenario-specific setup_kubernetes returning tuple of clients
  • Reduced file from 285 to 62 lines
+26/-248
kubernetes_functions.py
Refactor pod_network_outage module to use shared utilities

krkn/scenario_plugins/native/pod_network_outage/kubernetes_functions.py

  • Removed 11 duplicate functions now imported from kubernetes_utils
  • Added imports from common.kubernetes_utils module
  • Kept scenario-specific setup_kubernetes returning ApiClient instance
  • Reduced file from 275 to 35 lines
+25/-264

Signed-off-by: AR21SM <mahajanashishar21sm@gmail.com>
@qodo-code-review
Copy link

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
🟢
No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
🟢
🎫 #1120
🟢 Remove the duplicated kubernetes_functions.py implementations in
krkn/scenario_plugins/native/network/ and krkn/scenario_plugins/native/pod_network_outage/
to reduce maintenance burden.
Consolidate identical/near-identical Kubernetes helper functions into a shared location so
fixes do not need to be duplicated.
Preserve scenario-specific differences, notably the differing setup_kubernetes() return
types between the two scenarios.
Keep network-only functionality (create_ifb() and delete_ifb()) in the network scenario
module.
Ensure list_pods() supports both previous variants (including the exclude_label capability
from pod_network_outage), avoiding loss of behavior.
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Abrupt process exit: The new code uses sys.exit(1) and broad exception handling (including swallowing
BaseException) which prevents graceful degradation and can terminate the whole process
without structured recovery.

Referred Code
def create_pod(cli, body, namespace, timeout=120):
    """
    Function used to create a pod from a YAML config
    """

    try:
        pod_stat = None
        pod_stat = cli.create_namespaced_pod(body=body, namespace=namespace)
        end_time = time.time() + timeout
        while True:
            pod_stat = cli.read_namespaced_pod(
                name=body["metadata"]["name"], namespace=namespace
            )
            if pod_stat.status.phase == "Running":
                break
            if time.time() > end_time:
                raise Exception("Starting pod failed")
            time.sleep(1)
    except Exception as e:
        logging.error("Pod creation failed %s" % e)
        if pod_stat:


 ... (clipped 38 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status:
Unstructured exception logs: New log entries are plain strings and include full exception text (e.g., % api / % e),
which is unstructured and may inadvertently log sensitive operational details.

Referred Code
        logging.warning(
            "Exception when calling \
                       BatchV1Api->create_job: %s"
            % api
        )
        if api.status == 409:
            logging.warning("Job already present")
    except Exception as e:
        logging.error(
            "Exception when calling \
                       BatchV1Api->create_namespaced_job: %s"
            % e
        )
        raise


def delete_pod(cli, name, namespace):
    """
    Function that deletes a pod and waits until deletion is complete
    """



 ... (clipped 176 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status:
Missing action context: The new utilities perform critical cluster actions (create/delete pods/jobs) but logs do
not include actor/user identity or an explicit outcome record, making audit reconstruction
incomplete.

Referred Code
def create_job(batch_cli, body, namespace="default"):
    """
    Function used to create a job from a YAML config
    """

    try:
        api_response = batch_cli.create_namespaced_job(body=body, namespace=namespace)
        return api_response
    except ApiException as api:
        logging.warning(
            "Exception when calling \
                       BatchV1Api->create_job: %s"
            % api
        )
        if api.status == 409:
            logging.warning("Job already present")
    except Exception as e:
        logging.error(
            "Exception when calling \
                       BatchV1Api->create_namespaced_job: %s"
            % e


 ... (clipped 186 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Exception details logged: The new logging statements interpolate raw exception objects (e.g., ApiException) which
may include internal cluster details, and it is unclear if these logs are user-facing or
restricted to secure internal logs.

Referred Code
    try:
        api_response = batch_cli.create_namespaced_job(body=body, namespace=namespace)
        return api_response
    except ApiException as api:
        logging.warning(
            "Exception when calling \
                       BatchV1Api->create_job: %s"
            % api
        )
        if api.status == 409:
            logging.warning("Job already present")
    except Exception as e:
        logging.error(
            "Exception when calling \
                       BatchV1Api->create_namespaced_job: %s"
            % e
        )
        raise


def delete_pod(cli, name, namespace):


 ... (clipped 180 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Missing input validation: Inputs such as exclude_label and body["metadata"]["name"] are assumed
to be well-formed without validation, which can cause runtime errors or unexpected
behavior when external configuration is malformed.

Referred Code
def create_pod(cli, body, namespace, timeout=120):
    """
    Function used to create a pod from a YAML config
    """

    try:
        pod_stat = None
        pod_stat = cli.create_namespaced_pod(body=body, namespace=namespace)
        end_time = time.time() + timeout
        while True:
            pod_stat = cli.read_namespaced_pod(
                name=body["metadata"]["name"], namespace=namespace
            )
            if pod_stat.status.phase == "Running":
                break
            if time.time() > end_time:
                raise Exception("Starting pod failed")
            time.sleep(1)
    except Exception as e:
        logging.error("Pod creation failed %s" % e)
        if pod_stat:


 ... (clipped 72 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Add timeout to prevent infinite loop

In delete_pod, add a timeout to the while loop that waits for pod deletion to
prevent the process from hanging if the pod gets stuck in a Terminating state.

krkn/scenario_plugins/native/common/kubernetes_utils.py [42-56]

-def delete_pod(cli, name, namespace):
+def delete_pod(cli, name, namespace, timeout=120):
     """
     Function that deletes a pod and waits until deletion is complete
     """
 
     try:
         cli.delete_namespaced_pod(name=name, namespace=namespace)
-        while cli.read_namespaced_pod(name=name, namespace=namespace):
-            time.sleep(1)
+        end_time = time.time() + timeout
+        while time.time() < end_time:
+            try:
+                cli.read_namespaced_pod(name=name, namespace=namespace)
+                time.sleep(1)
+            except ApiException as e:
+                if e.status == 404:
+                    logging.info("Pod deleted")
+                    return
+        raise Exception(f"Pod {name} deletion timed out")
     except ApiException as e:
         if e.status == 404:
-            logging.info("Pod deleted")
+            logging.info("Pod already deleted")
         else:
             logging.error("Failed to delete pod %s" % e)
             raise e
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies that the delete_pod function can hang indefinitely and proposes adding a timeout, which is a critical improvement for robustness.

Medium
Prevent crash from invalid label format

In list_pods, add a check to validate that exclude_label contains an = before
splitting it to prevent a ValueError from malformed input.

krkn/scenario_plugins/native/common/kubernetes_utils.py [121-154]

 def list_pods(cli, namespace, label_selector=None, exclude_label=None):
     """
     Function used to list pods in a given namespace and having a certain label
     and excluding pods with exclude_label
     """
 ...
     for pod in ret.items:
         # Skip pods with the exclude label if specified
         if exclude_label and pod.metadata.labels:
+            if "=" not in exclude_label:
+                logging.warning(f"Invalid exclude_label format: {exclude_label}. Should be key=value.")
+                continue
             exclude_key, exclude_value = exclude_label.split("=", 1)
             if (
                 exclude_key in pod.metadata.labels
                 and pod.metadata.labels[exclude_key] == exclude_value
             ):
                 continue
         pods.append(pod.metadata.name)
 
     return pods

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential ValueError if exclude_label is malformed and provides a robust fix, preventing a crash from invalid user input.

Medium
Raise error instead of exiting process

In the create_pod exception handler, replace sys.exit(1) with raise to allow the
calling code to handle the failure instead of terminating the application.

krkn/scenario_plugins/native/common/kubernetes_utils.py [77-82]

 except Exception as e:
-    logging.error("Pod creation failed %s" % e)
+    logging.error("Pod creation failed: %s", e)
     if pod_stat:
         logging.error(pod_stat.status.container_statuses)
     delete_pod(cli, body["metadata"]["name"], namespace)
-    sys.exit(1)
+    raise
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that sys.exit(1) is inappropriate for a library function and proposes replacing it with raise, which is a significant improvement for reusability and error handling.

Medium
General
Fix delete_job log reference

In the delete_job function, correct the exception log message to reference
delete_namespaced_job instead of create_namespaced_job.

krkn/scenario_plugins/native/common/kubernetes_utils.py [209-214]

 except ApiException as api:
-    logging.warning(
-        "Exception when calling \
-                   BatchV1Api->create_namespaced_job: %s"
-        % api
-    )
+    logging.warning("Exception when calling BatchV1Api->delete_namespaced_job: %s", api)
  • Apply / Chat
Suggestion importance[1-10]: 4

__

Why: The suggestion corrects a copy-paste error in a log message, which improves the accuracy of logging and aids in debugging.

Low
  • More

@paigerube14
Copy link
Collaborator

@AR21SM thanks so much for this PR, we are currently working toward removing the "native" style scenarios in the future in favor of this work: #849
We could merge this for now, but longer term we will fully rely on krkn-lib for these functions. There are just current restrictions in the native folders that prevent us from using krkn-lib

@AR21SM
Copy link
Contributor Author

AR21SM commented Jan 27, 2026

@AR21SM thanks so much for this PR, we are currently working toward removing the "native" style scenarios in the future in favor of this work: #849 We could merge this for now, but longer term we will fully rely on krkn-lib for these functions. There are just current restrictions in the native folders that prevent us from using krkn-lib

Thank you for the context @paigerube14 , makes sense. Happy to help in the future!

@AR21SM
Copy link
Contributor Author

AR21SM commented Feb 9, 2026

@paigerube14

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[REFACTOR] Remove duplicate kubernetes_functions.py (2 files with ~90% identical code)

2 participants