Skip to content

feat: add additional states in the termination flow to handle DPUs falling off the host's boot options list#522

Open
spydaNVIDIA wants to merge 1 commit intoNVIDIA:mainfrom
spydaNVIDIA:pyda_boot
Open

feat: add additional states in the termination flow to handle DPUs falling off the host's boot options list#522
spydaNVIDIA wants to merge 1 commit intoNVIDIA:mainfrom
spydaNVIDIA:pyda_boot

Conversation

@spydaNVIDIA
Copy link
Contributor

@spydaNVIDIA spydaNVIDIA commented Mar 11, 2026

Description

feat: add additional states in the termination flow to handle DPUs falling off the host's boot options list

HostBootingProcess.pdf

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@spydaNVIDIA spydaNVIDIA requested a review from a team as a code owner March 11, 2026 18:49
@github-actions
Copy link

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-03-11 18:50:53 UTC | Commit: cbe0b68

@github-actions
Copy link

🛡️ Vulnerability Scan

🚨 Found 74 vulnerability(ies)
📊 vs main: 74 (no change)

Severity Breakdown:

  • 🔴 Critical/High: 74
  • 🟡 Medium: 0
  • 🔵 Low/Info: 0

🔗 View full details in Security tab

🕐 Last updated: 2026-03-11 18:50:54 UTC | Commit: cbe0b68

Comment on lines +4512 to +4516
tracing::warn!(
"Boot order incorrect for {} after SetBootOrder in HostInit, retrying SetBootOrder (retry {} of 3)",
mh_snapshot.host_snapshot.id,
retry_count + 1,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we put args as log parameters now? this is more for my own curiosity than expecting a change

Suggested change
tracing::warn!(
"Boot order incorrect for {} after SetBootOrder in HostInit, retrying SetBootOrder (retry {} of 3)",
mh_snapshot.host_snapshot.id,
retry_count + 1,
);
tracing::warn!(
machine_id=%mh_snapshot.host_snapshot.id,
retry_count=(retry_count+1),
"Boot order incorrect after SetBootOrder in HostInit, retrying SetBootOrder",
);

set_boot_order_info: Some(SetBootOrderInfo {
set_boot_order_jid: None,
set_boot_order_state: SetBootOrderState::SetBootOrder,
retry_count: retry_count + 1,
Copy link
Contributor

@wminckler wminckler Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this a transition to the same state instead of a wait? isn't there the possibility that the state will be processed too quickly?

}
}
SetBootOrderOutcome::BootOrderIncorrect => {
tracing::warn!(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not call boot_order_retry_or_fail like below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is just making a state transition, but it seems empty to enter a state, and immediately switch states without doing anything (especially in the "none" case)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants