omdb: add facility for abandoning a saga #7791

gjcolombo · 2025-03-13T18:31:39Z

Add an Abandoned saga state. This state disqualifies a saga from being picked up by Nexus saga recovery. (A running saga will continue running if it is Abandoned, and continued saga execution may end up clobbering the Abandoned state entirely.) Add an omdb subcommand to move a saga to this state (and refactor a bit to avoid duplicating code with the inject-error subcommand).

Tested (so far) by:

amending the datastore test that lists candidates for saga recovery
starting a demo saga in a dev cluster and verifying (via Nexus logs) that the saga is normally recovered when its Nexus is restarted (via svcadm restart), but is no longer recovered once abandoned (and can't be completed anymore); if I manually move the saga back to Running and restart its SEC again, the saga is picked up normally.

Fixes #7730.

gjcolombo · 2025-04-09T22:17:45Z

I've rebased onto the latest main and retested as noted in the PR description. I feel like I might be under-testing this a bit, so am especially open to suggestions for other things to try here.

davepacheco

Nice! Two minor suggestions here but it looks good.

davepacheco · 2025-04-16T21:41:14Z

dev-tools/omdb/src/bin/omdb/db/saga.rs

+  it will not resume executing the saga.
+
+- Other Nexuses will not adopt and resume the saga, even if its current assigned
+  Nexus is removed from the system.


Suggested change

Nexus is removed from the system.

Nexus is expunged.

davepacheco · 2025-04-16T21:41:31Z

dev-tools/omdb/src/bin/omdb/db/saga.rs

+If the saga's current Nexus is actively driving it, the saga will continue to
+execute even if it is abandoned. You should only proceed if:
+
+- you've stopped the saga's assigned Nexus and are prepared to undo any changes


Suggested change

- you've stopped the saga's assigned Nexus and are prepared to undo any changes

- you've stopped the saga's assigned Nexus AND are prepared to undo any changes

gjcolombo · 2025-04-16T21:46:05Z

dev-tools/omdb/src/bin/omdb/db/saga.rs

+    /// If this status indicates that the relevant SEC might be active, returns
+    /// `Err`. If the relevant SEC is thought to be inactive, or the saga used
+    /// to produce this status had no SEC, returns `Ok`.
+    fn as_result(self) -> anyhow::Result<()> {


nits: should probably be into_result since this consumes self. Also, the doc comment could be a little clearer about what's returned here: it's Ok if there's sufficient evidence to be confident that this saga doesn't belong to an active SEC and Err otherwise.

Define an "abandoned" saga state. An abandoned saga will not begin to be executed by any SEC. Technicians mark sagas as abandoned using omdb; this requires the saga's current executor not to be running (otherwise it could receive a state update from Steno that will clobber the Abandoned state). This commit defines the new state in the database schema and fixes up the DB crates accordingly, but adds no affordances for applying the new saga state or considering it when deciding what sagas to recover.

gjcolombo force-pushed the gjcolombo/abandon-ship branch from be46c6b to 397d5d7 Compare April 8, 2025 23:51

gjcolombo mentioned this pull request Apr 9, 2025

test failed in CI: test_instance_ephemeral_ip_from_correct_pool #7072

Open

gjcolombo force-pushed the gjcolombo/abandon-ship branch from 397d5d7 to f1868f0 Compare April 9, 2025 22:13

gjcolombo marked this pull request as ready for review April 9, 2025 22:16

gjcolombo requested review from davepacheco and jmpesp April 9, 2025 22:16

davepacheco approved these changes Apr 16, 2025

View reviewed changes

gjcolombo commented Apr 16, 2025

View reviewed changes

gjcolombo added 3 commits April 17, 2025 16:27

nexus: exclude Abandoned sagas from recovery candidates

4de030c

omdb: add facility for marking saga as Abandoned

59026d4

gjcolombo force-pushed the gjcolombo/abandon-ship branch from f1868f0 to 59026d4 Compare April 17, 2025 16:56

PR feedback

aa812ab

gjcolombo enabled auto-merge (squash) April 17, 2025 17:08

gjcolombo merged commit 4584e9e into main Apr 17, 2025
16 checks passed

gjcolombo deleted the gjcolombo/abandon-ship branch April 17, 2025 18:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

omdb: add facility for abandoning a saga #7791

omdb: add facility for abandoning a saga #7791

Uh oh!

gjcolombo commented Mar 13, 2025 •

edited

Loading

Uh oh!

gjcolombo commented Apr 9, 2025

Uh oh!

davepacheco left a comment

Uh oh!

davepacheco Apr 16, 2025

Uh oh!

davepacheco Apr 16, 2025

Uh oh!

gjcolombo Apr 16, 2025

Uh oh!

Uh oh!

Uh oh!

	- you've stopped the saga's assigned Nexus and are prepared to undo any changes
	- you've stopped the saga's assigned Nexus AND are prepared to undo any changes

omdb: add facility for abandoning a saga #7791

omdb: add facility for abandoning a saga #7791

Uh oh!

Conversation

gjcolombo commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gjcolombo commented Apr 9, 2025

Uh oh!

davepacheco left a comment

Choose a reason for hiding this comment

Uh oh!

davepacheco Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

davepacheco Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

gjcolombo Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gjcolombo commented Mar 13, 2025 •

edited

Loading