Skip to content

Conversation

@MessereN
Copy link

@MessereN MessereN commented Jan 5, 2026

Added additional functionality in the plugin to account for new resource keys: node_role and node_taint.

node_role: Allows users to define in either default-resources or in rule specific resources to schedule jobs on labeled nodes.
cluster configuration: ie. node-labels=role=snakemake-exec
snakemake resource usage: ie. node_role=snakemake-exec

node_taint: Introduces a new resource key that enables users to specify custom taints on their nodes in order to craft the necessary tolerations for the pod to be scheduled on the tainted node(s). This can be initialized in either default-resources or within rule specific resources. The expected format of this resource is key=value:effect. It is essential to note that the only supported operator is "=" but perhaps in the future this can be expanded upon. Also, there are error mechanisms in place to account for proper formatting and ensuring that the effect is one of the supported NoSchedule, PreferNoSchedule, NoExecute.
cluster configuration: ie. node-taints=workload=snakemake:NoSchedule
snakemake resource usage: ie. node_taint=workload=snakemake:NoSchedule

These extra resources are useful for custom node taints and selectors on a GKE cluster so jobs can land properly on specific nodes/pools and have the associated tolerations.

Summary by CodeRabbit

  • New Features
    • Support scheduling workflow jobs on labeled burst‑pool nodes via configurable role assignment.
    • Add toleration handling for node taints so jobs can run on tainted nodes when configured.
    • Validate toleration configuration and surface clear errors for invalid or unsupported formats.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 5, 2026

📝 Walkthrough

Walkthrough

Added optional node role scheduling via a node_role parameter (merged into node_selector) and introduced parsing/validation of node_taint strings to add Kubernetes Pod tolerations; invalid formats or unsupported effects raise WorkflowError. GPU tolerations and existing pod construction logic remain unchanged.

Changes

Cohort / File(s) Summary
Node scheduling and toleration handling
snakemake_executor_plugin_kubernetes/__init__.py
Added optional node_role support by injecting role=<node_role> into node_selector and logging the change. Added node_taint parsing (expected key=value:effect), validation of format and allowed effects, conversion to V1Toleration with Equal operator, appending to pod tolerations, and logging. Invalid inputs raise WorkflowError. Retains existing GPU toleration and pod/spec construction flow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Extra Node Selector and Tolerations' directly aligns with the main changes: adding node_role for node selectors and node_taint for tolerations support.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Fix all issues with AI Agents 🤖
In @snakemake_executor_plugin_kubernetes/__init__.py:
- Around line 220-251: Wrap the parsing of resources_dict["node_taint"] (the
toleration = str(...), key, rest = toleration.split("=", 1), value, effect =
rest.split(":", 1)) in a try/except that catches ValueError and raises a
WorkflowError with an informative message about the expected "key=value:effect"
format; keep the existing checks for empty parts and effect validity, and when
constructing the kubernetes.client.V1Toleration use consistent keyword spacing
(e.g., key=key, operator="Equal", value=value, effect=effect) and retain the
self.logger.debug call to log the added toleration.
🧹 Nitpick comments (2)
snakemake_executor_plugin_kubernetes/__init__.py (2)

206-209: Consider documenting or making the node selector label key configurable.

The implementation hardcodes "role" as the node selector key. While this may work for specific cluster setups, Kubernetes doesn't enforce a standard "role" label. Consider:

  • Documenting that users must label their nodes with role=<value> for this feature to work
  • Alternatively, making the label key configurable (e.g., node_role_label setting) to support different cluster labeling schemes
Example of configurable approach

If configurability is desired, you could allow users to specify the full label in the resource:

# Example: node_selector_label="custom-pool=snakemake-exec"
if "node_selector_label" in resources_dict:
    label_spec = str(resources_dict["node_selector_label"])
    if "=" in label_spec:
        key, value = label_spec.split("=", 1)
        node_selector[key] = value
    else:
        node_selector[label_spec] = "true"

However, the current implementation is acceptable if the role label is standard in your environment.


229-238: Optional: Consider the static analysis hint about exception messages.

The static analysis tool suggests avoiding long messages outside the exception class (TRY003). However, for this use case, the contextual error messages are appropriate and clear. You may safely ignore this hint, or if you prefer to address it, you could define custom exception subclasses.

This is a low-priority style suggestion and doesn't affect functionality. Based on coding guidelines for this project.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d5f720e and cd465d5.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_kubernetes/__init__.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • snakemake_executor_plugin_kubernetes/__init__.py
🪛 Ruff (0.14.10)
snakemake_executor_plugin_kubernetes/__init__.py

230-233: Avoid specifying long messages outside the exception class

(TRY003)


235-238: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
snakemake_executor_plugin_kubernetes/__init__.py (1)

220-251: Integration with existing GPU tolerations is correct.

The node_taint toleration handling integrates well with the existing GPU toleration logic (lines 252-298). Both properly initialize pod_spec.tolerations if needed and can coexist, allowing pods to tolerate both custom taints and GPU-specific taints.

@MessereN MessereN changed the title Feature: Extra Node Selector and Tolerations feat: Extra Node Selector and Tolerations Jan 5, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
snakemake_executor_plugin_kubernetes/__init__.py (1)

220-258: LGTM! Error handling and validation are solid.

The implementation correctly handles the node_taint resource with proper error handling for parsing failures and comprehensive validation. The try/except block at lines 227-234 addresses the parsing concern from the previous review.

Optionally, you can improve the exception chaining by using raise ... from None to indicate that the ValueError is intentionally being replaced:

🔎 Optional improvement for exception chaining
         try:
             key, rest = toleration.split("=", 1)
             value, effect = rest.split(":", 1)
         except ValueError:
-            raise WorkflowError(
+            raise WorkflowError(
                 f"Invalid node_taint format: '{toleration}'. "
                 "Expected format: key=value:effect"
-            )
+            ) from None

This makes it clearer that the ValueError is expected and being intentionally replaced with a more user-friendly WorkflowError.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd465d5 and 0f2a7b1.

📒 Files selected for processing (1)
  • snakemake_executor_plugin_kubernetes/__init__.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • snakemake_executor_plugin_kubernetes/__init__.py
🪛 Ruff (0.14.10)
snakemake_executor_plugin_kubernetes/__init__.py

231-234: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


231-234: Avoid specifying long messages outside the exception class

(TRY003)


237-240: Avoid specifying long messages outside the exception class

(TRY003)


242-245: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (1)
snakemake_executor_plugin_kubernetes/__init__.py (1)

206-209: No changes needed — the hardcoded "role" label key is intentional.

The code comment explicitly documents that this feature is for scheduling on GKE burst-pool nodes using the role label key (e.g., role=snakemake-exec). The hardcoding is not a flexibility issue but rather the correct implementation for this specific use case. The role label is the standard convention for GKE burst pools, and the design is properly documented in the code.

Likely an incorrect or invalid review comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant