Skip to content

feat(db): make rollout state runtime-local for distributed Nebraska#1396

Open
Moustafa-Moustafa wants to merge 1 commit into
flatcar:mainfrom
Moustafa-Moustafa:feat/group-state-split
Open

feat(db): make rollout state runtime-local for distributed Nebraska#1396
Moustafa-Moustafa wants to merge 1 commit into
flatcar:mainfrom
Moustafa-Moustafa:feat/group-state-split

Conversation

@Moustafa-Moustafa

Copy link
Copy Markdown

The rollout_in_progress indicator for a group is the only field on the admin-managed groups table that is written by the update-serving engine, not the admin. This is a step towards the distributed topology described in RFC #1375: admin state needs to flow one-way from the admin node to each runtime node, while runtime state must stay local so each runtime node records what it observes from its own clients.

This change moves that single field onto its own runtime-local table. Standalone-mode behavior is unchanged.

Note on policy_safe_mode: in distributed mode there is no path for runtime nodes to send observations back to the admin node, which means the safe_mode auto-pause cannot work as it does today. I'm proposing we do not support safe_mode while distributed mode is enabled, at least within the scope of this effort. A proper solution (e.g. some form of runtime-to-admin state sharing) can be designed separately.

Refs: #1375

Separate the groups table

[ describe the change in 1 - 3 paragraphs ]

How to use

No new behavior is introduced, Nebraska running in single instance mode (currently the only mode) shouldn't be impacted at all.

Testing done

  • `cd backend && make all` — clean (codegen, generators, golangci-lint, build)

  • `cd backend && make check-backend-with-container` — passes (full suite incl.
    `backend/test/api` integration and `backend/test/auth/oidc`)

  • Smoke test on a local environment

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)

  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

@Moustafa-Moustafa Moustafa-Moustafa requested a review from a team as a code owner June 11, 2026 16:59
The `rollout_in_progress` indicator for a group is the only field on
the admin-managed `groups` table that is written by the update-serving
engine, not the admin. This is a step towards the distributed topology
described in RFC flatcar#1375: admin state needs to flow one-way from the
admin node to each runtime node, while runtime state must stay local
so each runtime node records what it observes from its own clients.

This change moves that single field onto its own runtime-local table.
Standalone-mode behavior is unchanged.

Note on `policy_safe_mode`: in distributed mode there is no path for
runtime nodes to send observations back to the admin node, which means
the `safe_mode` auto-pause cannot work as it does today. I'm proposing
we do not support `safe_mode` while distributed mode is enabled, at
least within the scope of this effort. A proper solution (e.g. some
form of runtime-to-admin state sharing) can be designed separately.

Refs: flatcar#1375
Signed-off-by: Moustafa Moustafa <momousta@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant