Skip to content

omdb: add facility for abandoning a saga #7791

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 17, 2025
Merged

Conversation

gjcolombo
Copy link
Contributor

@gjcolombo gjcolombo commented Mar 13, 2025

Add an Abandoned saga state. This state disqualifies a saga from being picked up by Nexus saga recovery. (A running saga will continue running if it is Abandoned, and continued saga execution may end up clobbering the Abandoned state entirely.) Add an omdb subcommand to move a saga to this state (and refactor a bit to avoid duplicating code with the inject-error subcommand).

Tested (so far) by:

  • amending the datastore test that lists candidates for saga recovery
  • starting a demo saga in a dev cluster and verifying (via Nexus logs) that the saga is normally recovered when its Nexus is restarted (via svcadm restart), but is no longer recovered once abandoned (and can't be completed anymore); if I manually move the saga back to Running and restart its SEC again, the saga is picked up normally.

Fixes #7730.

@gjcolombo gjcolombo force-pushed the gjcolombo/abandon-ship branch from be46c6b to 397d5d7 Compare April 8, 2025 23:51
@gjcolombo gjcolombo force-pushed the gjcolombo/abandon-ship branch from 397d5d7 to f1868f0 Compare April 9, 2025 22:13
@gjcolombo gjcolombo marked this pull request as ready for review April 9, 2025 22:16
@gjcolombo gjcolombo requested review from davepacheco and jmpesp April 9, 2025 22:16
@gjcolombo
Copy link
Contributor Author

I've rebased onto the latest main and retested as noted in the PR description. I feel like I might be under-testing this a bit, so am especially open to suggestions for other things to try here.

Copy link
Collaborator

@davepacheco davepacheco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Two minor suggestions here but it looks good.

it will not resume executing the saga.

- Other Nexuses will not adopt and resume the saga, even if its current assigned
Nexus is removed from the system.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Nexus is removed from the system.
Nexus is expunged.

If the saga's current Nexus is actively driving it, the saga will continue to
execute even if it is abandoned. You should only proceed if:

- you've stopped the saga's assigned Nexus and are prepared to undo any changes
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- you've stopped the saga's assigned Nexus and are prepared to undo any changes
- you've stopped the saga's assigned Nexus AND are prepared to undo any changes

/// If this status indicates that the relevant SEC might be active, returns
/// `Err`. If the relevant SEC is thought to be inactive, or the saga used
/// to produce this status had no SEC, returns `Ok`.
fn as_result(self) -> anyhow::Result<()> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits: should probably be into_result since this consumes self. Also, the doc comment could be a little clearer about what's returned here: it's Ok if there's sufficient evidence to be confident that this saga doesn't belong to an active SEC and Err otherwise.

Define an "abandoned" saga state. An abandoned saga will not begin to be
executed by any SEC. Technicians mark sagas as abandoned using omdb;
this requires the saga's current executor not to be running (otherwise
it could receive a state update from Steno that will clobber the
Abandoned state).

This commit defines the new state in the database schema and fixes up
the DB crates accordingly, but adds no affordances for applying the new
saga state or considering it when deciding what sagas to recover.
@gjcolombo gjcolombo force-pushed the gjcolombo/abandon-ship branch from f1868f0 to 59026d4 Compare April 17, 2025 16:56
@gjcolombo gjcolombo enabled auto-merge (squash) April 17, 2025 17:08
@gjcolombo gjcolombo merged commit 4584e9e into main Apr 17, 2025
16 checks passed
@gjcolombo gjcolombo deleted the gjcolombo/abandon-ship branch April 17, 2025 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

want a tool for saga abandonment
2 participants