feat: add additional states in the termination flow to handle DPUs falling off the host's boot options list#522
feat: add additional states in the termination flow to handle DPUs falling off the host's boot options list#522spydaNVIDIA wants to merge 1 commit intoNVIDIA:mainfrom
Conversation
…lling off the host's boot options list
🔐 TruffleHog Secret Scan✅ No secrets or credentials found! Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉 🕐 Last updated: 2026-03-11 18:50:53 UTC | Commit: cbe0b68 |
🛡️ Vulnerability Scan🚨 Found 74 vulnerability(ies) Severity Breakdown:
🔗 View full details in Security tab 🕐 Last updated: 2026-03-11 18:50:54 UTC | Commit: cbe0b68 |
| tracing::warn!( | ||
| "Boot order incorrect for {} after SetBootOrder in HostInit, retrying SetBootOrder (retry {} of 3)", | ||
| mh_snapshot.host_snapshot.id, | ||
| retry_count + 1, | ||
| ); |
There was a problem hiding this comment.
don't we put args as log parameters now? this is more for my own curiosity than expecting a change
| tracing::warn!( | |
| "Boot order incorrect for {} after SetBootOrder in HostInit, retrying SetBootOrder (retry {} of 3)", | |
| mh_snapshot.host_snapshot.id, | |
| retry_count + 1, | |
| ); | |
| tracing::warn!( | |
| machine_id=%mh_snapshot.host_snapshot.id, | |
| retry_count=(retry_count+1), | |
| "Boot order incorrect after SetBootOrder in HostInit, retrying SetBootOrder", | |
| ); |
| set_boot_order_info: Some(SetBootOrderInfo { | ||
| set_boot_order_jid: None, | ||
| set_boot_order_state: SetBootOrderState::SetBootOrder, | ||
| retry_count: retry_count + 1, |
There was a problem hiding this comment.
why is this a transition to the same state instead of a wait? isn't there the possibility that the state will be processed too quickly?
| } | ||
| } | ||
| SetBootOrderOutcome::BootOrderIncorrect => { | ||
| tracing::warn!( |
There was a problem hiding this comment.
why not call boot_order_retry_or_fail like below?
There was a problem hiding this comment.
I guess this is just making a state transition, but it seems empty to enter a state, and immediately switch states without doing anything (especially in the "none" case)
Description
feat: add additional states in the termination flow to handle DPUs falling off the host's boot options list
HostBootingProcess.pdf
Type of Change
Related Issues (Optional)
Breaking Changes
Testing
Additional Notes