Skip to content

batch params together in weight sync and async update the weights#5249

Open
winglian wants to merge 1 commit intohuggingface:mainfrom
winglian:faster-sync-weights
Open

batch params together in weight sync and async update the weights#5249
winglian wants to merge 1 commit intohuggingface:mainfrom
winglian:faster-sync-weights

Conversation

@winglian
Copy link
Contributor

@winglian winglian commented Mar 9, 2026

What does this PR do?

Rather than firing off 100s of HTTP calls to update weights to vLLM, we can update these in batch. Also, we can async the rpc collective call so the update doesn't block the HTTP call.

For Owen2 0.5B, this cuts the weight sync from 0.6sec to 0.15sec.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.


Note

Medium Risk
Touches distributed weight-sync plumbing (HTTP + NCCL/XCCL ordering, new batching/chunking, and async worker dispatch), so subtle ordering or shape/dtype mismatches could cause sync hangs or incorrect weights despite the change being localized.

Overview
Speeds up vLLM server-mode weight synchronization by introducing a batched update path: the client now can POST parameter metadata once (optionally chunked by total tensor elements) and then broadcast tensors sequentially, and the server loads the received weights in one load_weights() call.

Updates VLLMGeneration.sync_weights() to use this batched sync when not using ZeRO-3/FSDP gather paths, and threads a new config/arg (weight_sync_chunk_size / vllm_weight_sync_chunk_size) from GRPOConfig through GRPOTrainer into VLLMGeneration.

On the trl vllm-serve side, adds a /batch_update_named_params/ endpoint and switches the existing init_communicator/update_named_param fan-out to workers to send over pipes concurrently via asyncio to avoid blocking the HTTP handler.

Written by Cursor Bugbot for commit b777252. This will update automatically on new commits. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

commit the batch_update_named_params method
fix peft handling and also reorder elif so we don't have unhandled conditions
@winglian winglian force-pushed the faster-sync-weights branch from fe437e1 to b777252 Compare March 9, 2026 13:55
@winglian winglian mentioned this pull request Mar 9, 2026
5 tasks
@qgallouedec
Copy link
Member

thanks for the PR. I'd prefer having a good heuristic (hard-coded value) and avoid adding again a new parameter. What would be a good value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants