Skip to content

scripts: add run_in_parallel helper for prod ops#14300

Merged
matanl-starkware merged 1 commit into
mainfrom
matanl/prod-parallel-helper
Jun 3, 2026
Merged

scripts: add run_in_parallel helper for prod ops#14300
matanl-starkware merged 1 commit into
mainfrom
matanl/prod-parallel-helper

Conversation

@matanl-starkware

Copy link
Copy Markdown
Collaborator

Add a thread-pool-based run_in_parallel helper to common_lib that buffers each
worker's output into a labeled block, prints a 5s progress heartbeat naming
still-running items, and aggregates per-item failures into a single non-zero
exit. Resolve print_colored's output stream at call time so redirection works.

Foundation for parallelizing per-node kubectl operations; nothing calls it yet.

Co-Authored-By: Claude Opus 4.8 (1M context) noreply@anthropic.com

@cursor

cursor Bot commented Jun 3, 2026

Copy link
Copy Markdown

PR Summary

Low Risk
New library code and tests only; no callers yet, so production kubectl flows are unchanged until follow-up wiring.

Overview
Adds run_in_parallel to common_lib for capped thread-pool execution of I/O-bound prod work (e.g. future per-node kubectl), with per-worker log buffering via thread-local _output_sink, labeled output blocks on completion, periodic heartbeats for still-running items, and aggregated failures with sys.exit(1) after all items finish.

print_colored now resolves file at call time (fixes redirection) and appends to the worker buffer when parallel capture is active; a _print_lock keeps real stdout/stderr writes from interleaving.

New test_run_in_parallel.py covers result ordering, empty input, buffered output grouping, heartbeats, and failure reporting. No prod scripts call the helper yet—library-only foundation.

Reviewed by Cursor Bugbot for commit df94c18. Bugbot is set up for automated code reviews on this repo. Configure here.

@reviewable-StarkWare

Copy link
Copy Markdown

This change is Reviewable

Comment thread scripts/prod/common_lib.py

@ron-starkware ron-starkware left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ron-starkware reviewed 2 files and all commit messages, and made 2 comments.
Reviewable status: all files reviewed, 3 unresolved discussions (waiting on matanl-starkware).


scripts/prod/common_lib.py line 77 at r1 (raw file):

    is printed every `heartbeat_interval_seconds`.

    Errors: workers must raise rather than call `sys.exit()` (a `SystemExit` raised in a worker

I see that except BaseException does catch sys.exit(), which the workers throw on an exception (and then prints the SystemExit code).


scripts/prod/common_lib.py line 91 at r1 (raw file):

    errors: dict[int, BaseException] = {}

    def run_one(index: int, item: T) -> R:

I see that index isn't used in this function (nor later in the stack)

@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from 893cb64 to a92e79a Compare June 3, 2026 08:04
@matanl-starkware

Copy link
Copy Markdown
Collaborator Author

Addressed in the latest push:

  • Removed the unused index parameter from run_one.
  • run_in_parallel now re-raises KeyboardInterrupt instead of recording it as an item failure, so Ctrl-C aborts the whole run (per Cursor Bugbot).
  • Reworded the docstring to match behavior: a worker that raises or calls sys.exit() is recorded and aggregated into a single non-zero exit; only KeyboardInterrupt propagates.

Comment thread scripts/prod/common_lib.py

@ron-starkware ron-starkware left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ron-starkware reviewed 1 file and all commit messages, and resolved 2 discussions.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on matanl-starkware).

@matanl-starkware matanl-starkware changed the base branch from main to graphite-base/14300 June 3, 2026 08:28
@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from a92e79a to 04009a0 Compare June 3, 2026 08:29
@matanl-starkware matanl-starkware changed the base branch from graphite-base/14300 to matanl/ci-exclude-prod-scripts-system-test June 3, 2026 08:29

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 04009a0. Configure here.

Comment thread scripts/prod/common_lib.py
@matanl-starkware matanl-starkware force-pushed the matanl/ci-exclude-prod-scripts-system-test branch from bd81d90 to 29828fb Compare June 3, 2026 08:36
@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from 04009a0 to 029ce2e Compare June 3, 2026 08:36
@matanl-starkware matanl-starkware changed the base branch from matanl/ci-exclude-prod-scripts-system-test to graphite-base/14300 June 3, 2026 11:55
@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from 029ce2e to 463f2bc Compare June 3, 2026 11:56
@matanl-starkware matanl-starkware changed the base branch from graphite-base/14300 to main June 3, 2026 11:56

@matanl-starkware matanl-starkware left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matanl-starkware resolved 2 discussions.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on matanl-starkware).

@matanl-starkware matanl-starkware changed the base branch from main to graphite-base/14300 June 3, 2026 12:19
@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from 463f2bc to 654aa33 Compare June 3, 2026 12:19
@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from 654aa33 to a185a20 Compare June 3, 2026 12:19
@matanl-starkware matanl-starkware changed the base branch from graphite-base/14300 to matanl/ci-fix-hybrid-system-test June 3, 2026 12:19
@graphite-app graphite-app Bot changed the base branch from matanl/ci-fix-hybrid-system-test to main June 3, 2026 12:20
@graphite-app

graphite-app Bot commented Jun 3, 2026

Copy link
Copy Markdown

Merge activity

  • Jun 3, 12:20 PM UTC: Graphite rebased this pull request, because this pull request is set to merge when ready.

@matanl-starkware matanl-starkware changed the base branch from main to graphite-base/14300 June 3, 2026 12:22
@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from a185a20 to fc6f37a Compare June 3, 2026 12:22
@matanl-starkware matanl-starkware changed the base branch from graphite-base/14300 to matanl/ci-fix-hybrid-system-test June 3, 2026 12:22
@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from fc6f37a to fdefe46 Compare June 3, 2026 12:33
@matanl-starkware matanl-starkware force-pushed the matanl/ci-fix-hybrid-system-test branch from 0f582a6 to 8d50432 Compare June 3, 2026 12:34
@matanl-starkware matanl-starkware changed the base branch from matanl/ci-fix-hybrid-system-test to main June 3, 2026 12:59
Add a thread-pool-based run_in_parallel helper to common_lib that buffers each
worker's output into a labeled block, prints a 5s progress heartbeat naming
still-running items, and aggregates per-item failures into a single non-zero
exit. Resolve print_colored's output stream at call time so redirection works.

Foundation for parallelizing per-node kubectl operations; nothing calls it yet.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@matanl-starkware matanl-starkware force-pushed the matanl/prod-parallel-helper branch from fdefe46 to df94c18 Compare June 3, 2026 13:50
@matanl-starkware matanl-starkware added this pull request to the merge queue Jun 3, 2026
Merged via the queue into main with commit a10b51d Jun 3, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants