Skip to content

Fail playbook when Ironic node reaches deploy failed state#32

Merged
agonzalezrh merged 1 commit intomainfrom
ironic-deploy-failed-handling
Mar 7, 2026
Merged

Fail playbook when Ironic node reaches deploy failed state#32
agonzalezrh merged 1 commit intomainfrom
ironic-deploy-failed-handling

Conversation

@eurijon
Copy link
Copy Markdown
Contributor

@eurijon eurijon commented Mar 5, 2026

Stop silently accepting 'deploy failed' as a successful deployment outcome.
Add failed_when so the task fails immediately when the node reaches that state, while still exiting the retry loop early.

Summary by CodeRabbit

  • Bug Fixes
    • Improved failure detection during hardware node provisioning: the wait step now aborts immediately if a node enters a "deploy failed" state, preventing further retries.
    • Preserves the existing until/retries/delay retry loop while providing faster failure feedback and clearer error behavior during provisioning.

@github-actions github-actions Bot added the infrastructure Infrastructure setup (VMs, networks) label Mar 5, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 5, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7dff4195-3544-4279-9dfd-a6a26a2fd482

📥 Commits

Reviewing files that changed from the base of the PR and between 5349652 and bc97dc2.

📒 Files selected for processing (1)
  • playbooks/tasks/configure_hardware_ironic_node.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • playbooks/tasks/configure_hardware_ironic_node.yaml

Walkthrough

A failure condition was added to the "Wait for node to reach active state" task in the Ironic node configuration playbook: the task now fails immediately if the node provisioning state becomes "deploy failed", while existing retry/until behavior remains unchanged.

Changes

Cohort / File(s) Summary
Ironic Node Provisioning Task
playbooks/tasks/configure_hardware_ironic_node.yaml
Added a failed_when condition to the wait task to fail immediately when the node's provisioning state equals "deploy failed". Existing until/retries/delay settings remain unchanged.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: adding a failed_when condition to fail the playbook when an Ironic node reaches deploy failed state.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ironic-deploy-failed-handling

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@playbooks/tasks/configure_hardware_ironic_node.yaml`:
- Line 192: The current per-host loop uses the failed_when condition checking
node_state.json.provision_state == "deploy failed", which causes the loop to
bail out and prevent subsequent cleanup tasks like wait_for_deployment.yaml from
running; modify the playbook so the loop body remains the same (keep the
failed_when check) but wrap the entire per-host loop in an Ansible block with an
always: section (or use block/rescue/always) so that cleanup tasks (e.g.,
wait_for_deployment.yaml and the Ironic detach/virtual media cleanup tasks) are
executed regardless of a deploy failure; ensure references to the failed_when
condition, node_state.json.provision_state, and wait_for_deployment.yaml are
preserved so the semantics and logging remain unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dcae7010-c833-4213-a8d8-39c9bcf96b21

📥 Commits

Reviewing files that changed from the base of the PR and between 997e4ce and 2c65a27.

📒 Files selected for processing (1)
  • playbooks/tasks/configure_hardware_ironic_node.yaml

Comment thread playbooks/tasks/configure_hardware_ironic_node.yaml
@eurijon eurijon marked this pull request as draft March 5, 2026 16:34
@eurijon eurijon marked this pull request as ready for review March 6, 2026 11:13
@eurijon
Copy link
Copy Markdown
Contributor Author

eurijon commented Mar 6, 2026

@agonzalezrh this change as it's now it's failing the playbook when the node goes to deploy failed, meaning that it exists from the agent_hosts loop and doesn't execute playbooks/tasks/wait_for_deployment.yaml, so playbooks/tasks/configure_hardware_ironic_detach_vmedia.yaml and playbooks/tasks/configure_hardware_ironic_cleanup.yaml are not executed. Is that ok? Should we do a proper cleanup if there is some issue with a node?

@eurijon eurijon force-pushed the ironic-deploy-failed-handling branch from 2c65a27 to 5349652 Compare March 6, 2026 11:23
Stop silently accepting 'deploy failed' as a successful deployment outcome.
Add failed_when so the task fails immediately when the node reaches that
state, while still exiting the retry loop early.
@eurijon eurijon force-pushed the ironic-deploy-failed-handling branch from 5349652 to bc97dc2 Compare March 6, 2026 12:17
@agonzalezrh agonzalezrh added this pull request to the merge queue Mar 7, 2026
Merged via the queue into main with commit 50e1e42 Mar 7, 2026
15 of 17 checks passed
@agonzalezrh agonzalezrh deleted the ironic-deploy-failed-handling branch March 7, 2026 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

infrastructure Infrastructure setup (VMs, networks)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants