Skip to content

Implement Remote Persistent Workers#2323

Merged
MarcusSorealheis merged 16 commits into
mainfrom
persistent-workers-design
May 21, 2026
Merged

Implement Remote Persistent Workers#2323
MarcusSorealheis merged 16 commits into
mainfrom
persistent-workers-design

Conversation

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator

@MarcusSorealheis MarcusSorealheis commented May 12, 2026

Description

Adds remote persistent worker support for NativeLink worker execution, including Bazel worker protocol handling, worker pooling, lifecycle management, and focused scheduler/worker tests.

Also adds persistent worker deployment examples and website docs, plus a Bazel wrapper for checking the website docs build.

If you need something that we don't have here, please let us know.

Not Implemented In v1

  • Multiplex workers: no concurrent WorkRequests against one process.
  • Per-request worker sandboxing: no Bazel --worker_sandboxing semantics yet.
  • Dynamic strategy: no local-plus-remote racing.
  • Cross-host or cross-worker migration of in-memory tool state.
  • Global live-worker memory cap across all WorkerKeys.
  • Helm memory-sizing guidance is called out in the design, but not implemented here.

Fixes #2050

Type of change

Please delete options that aren't relevant.

  • New feature (non-breaking change which adds functionality)
    not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

bazel test //nativelink-worker:unit_test //nativelink-worker:integration //nativelink-scheduler:integration --test_output=errors

AND (newly because this PR will put in place systems that make TS much faster)

bazel test //web/platform:check_test --test_output=errors

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator Author

/build-image

@github-actions
Copy link
Copy Markdown

Image built and pushed!

ghcr.io/TraceMachina/nativelink:4981c06

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator Author

Oddly, the only failure we saw was on Windows but it should've failed on all platforms. Something to think about @palfrey and maybe a consequence of the work to integrate the new rulesets.

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator Author

/build-image

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator Author

FYI: Current CI failure is related to our upstream Nix cache not the code in the PR. It passed prior to the merge commit and has been stable in my testing.

@github-actions
Copy link
Copy Markdown

Image built and pushed!

ghcr.io/TraceMachina/nativelink:e384c1e

@MarcusSorealheis
Copy link
Copy Markdown
Collaborator Author

Locally, another thing I tried was I set up and verified a local persistent-worker cluster on the MacBook.

  What’s running:

  - NativeLink local scheduler/CAS/workers on grpc://127.0.0.1:50051
  - Worker API on 127.0.0.1:50061
  - Config: /tmp/nl-pw-smoke/nativelink-pw.json5 ~ actual link to the config at the bottom of this comment.
  - Smoke workspace: /tmp/nl-pw-smoke

  Verification passed:

  - Built a tiny Bazel workspace through local NativeLink remote execution.
  - Actions hit the persistent-worker path: logs show Spawned new
    persistent worker and Persistent worker command complete.
  - Repeated clean/build cycles advanced worker request counts from 1 to 2
    to 3, proving the persistent workers stayed alive and were reused
    across Bazel invocations.

  Latest output:

  one count=3 cwd=/private/tmp/nativelink-multi-test/worker2/work
  two count=3 cwd=/private/tmp/nativelink-multi-test/worker3/work
  three count=3 cwd=/private/tmp/nativelink-multi-test/worker1/work

Everything seems to be working just fine. After this is merged, I will upgrade the version to v1.2.1. Here is the config I used to smoke test.

@amankrx
Copy link
Copy Markdown
Collaborator

amankrx commented May 19, 2026

/build-image nativelink-worker-init

@amankrx
Copy link
Copy Markdown
Collaborator

amankrx commented May 19, 2026

/build-image

@github-actions
Copy link
Copy Markdown

Image built and pushed!

ghcr.io/TraceMachina/nativelink-worker-init:8a12376

@github-actions
Copy link
Copy Markdown

Image built and pushed!

ghcr.io/TraceMachina/nativelink:8a12376

Copy link
Copy Markdown
Collaborator

@amankrx amankrx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested with the helm chart and Persistent Workers are working as expectec.

@MarcusSorealheis MarcusSorealheis merged commit 78d1232 into main May 21, 2026
44 checks passed
@MarcusSorealheis MarcusSorealheis deleted the persistent-workers-design branch May 21, 2026 06:44
Comment thread nativelink-worker/src/persistent_worker/mod.rs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants