Skip to content

scripts: confirm before restarting all Core pods at once#14305

Merged
matanl-starkware merged 1 commit into
mainfrom
matanl/prod-confirm-core-all-at-once
Jun 3, 2026
Merged

scripts: confirm before restarting all Core pods at once#14305
matanl-starkware merged 1 commit into
mainfrom
matanl/prod-confirm-core-all-at-once

Conversation

@matanl-starkware

Copy link
Copy Markdown
Collaborator

Restarting every Core pod simultaneously brings all consensus validators down at
once and can halt the chain. In the parallel (ALL_AT_ONCE) restart flow, prompt
the user for an explicit y/n confirmation when the service is Core, and abort the
restart if they decline. Non-Core services are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

@cursor

cursor Bot commented Jun 3, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
Operator-facing guardrail in prod restart tooling only; reduces accidental consensus outage risk without changing restart logic after confirmation.

Overview
Adds a safety gate in the parallel (ALL_AT_ONCE) restart path in restarter_lib: when the service is Core, the operator must confirm via wait_until_y_or_n before every Core pod is restarted at once; declining aborts with exit code 1 and no pod deletes. Gateway and other non-Core services keep the existing behavior with no extra prompt.

Tests stub the prompt for existing Core parallel cases, and add coverage for decline → abort and non-Core → no prompt.

Reviewed by Cursor Bugbot for commit 526bd93. Bugbot is set up for automated code reviews on this repo. Configure here.

@reviewable-StarkWare

Copy link
Copy Markdown

This change is Reviewable

@ron-starkware ron-starkware left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ron-starkware reviewed 2 files and all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on matanl-starkware).

@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch from 94e05b2 to dd039bc Compare June 3, 2026 08:04
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch from 1898d5c to a042263 Compare June 3, 2026 08:29
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch from dd039bc to 70ef243 Compare June 3, 2026 08:29
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch from a042263 to b82777d Compare June 3, 2026 08:36
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch from 70ef243 to 3e6576d Compare June 3, 2026 08:36
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch from b82777d to 3833c63 Compare June 3, 2026 11:56
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch 2 times, most recently from 2e2c13c to 70d97b5 Compare June 3, 2026 12:19
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch from 3833c63 to 6d1e138 Compare June 3, 2026 12:19

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 70d97b5. Configure here.

Comment thread scripts/prod/restarter_lib.py
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch from 70d97b5 to a10db9d Compare June 3, 2026 12:22
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch from 6d1e138 to 2248b90 Compare June 3, 2026 12:22
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch from a10db9d to 85e7e78 Compare June 3, 2026 12:33
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch from 2248b90 to 22e9782 Compare June 3, 2026 12:34
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch from 85e7e78 to ab9529d Compare June 3, 2026 13:50
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch 2 times, most recently from 7fbf29b to 6879a40 Compare June 3, 2026 14:00
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch 2 times, most recently from 5bca326 to 7f0a1c0 Compare June 3, 2026 14:16
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch from 6879a40 to 6fcbcb6 Compare June 3, 2026 14:16
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch from 7f0a1c0 to eb60175 Compare June 3, 2026 17:46
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch 2 times, most recently from 53882ed to a53aaac Compare June 3, 2026 18:01
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch 2 times, most recently from 88d78e4 to e800f90 Compare June 3, 2026 21:15
@matanl-starkware matanl-starkware force-pushed the matanl/prod-add-committer-service branch from a53aaac to c7b1c92 Compare June 3, 2026 21:15
Restarting every Core pod simultaneously brings all consensus validators down at
once and can halt the chain. In the parallel (ALL_AT_ONCE) restart flow, prompt
the user for an explicit y/n confirmation when the service is Core, and abort the
restart if they decline. Non-Core services are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@matanl-starkware matanl-starkware changed the base branch from matanl/prod-add-committer-service to graphite-base/14305 June 3, 2026 21:35
@matanl-starkware matanl-starkware force-pushed the matanl/prod-confirm-core-all-at-once branch from e800f90 to 526bd93 Compare June 3, 2026 21:35
@matanl-starkware matanl-starkware changed the base branch from graphite-base/14305 to main June 3, 2026 21:35
@matanl-starkware

Copy link
Copy Markdown
Collaborator Author

@ron-starkware — next 🙏 #14304 merged, so this rebased onto main and its Reviewable check reset. Re-approve this revision when you can? (Rebase only — no code changes.)

@matanl-starkware matanl-starkware added this pull request to the merge queue Jun 3, 2026
Merged via the queue into main with commit ebf6d8b Jun 3, 2026
16 of 39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants