[CB] Easy optimizations for continuous batching #42839

remi-or · 2025-12-12T14:48:36Z

Summary

This PR adds a few optimizations for continuous batching:

removes non-needed torch.cuda.synchronize
add sorting of the inputs to maximize prefix caching hits
sampling is done on the GPU
removed an extranuous axis from the output_ids

Performance

Attention	Version	Generated tokens	Duration (s)	Throughput (tok/s)
Flash attention 3	This PR	112599	16.73	6729.27
Flash attention 3	Main branch	111823	25.67	4355.68
Flash attention 2	This PR	112822	24.61	4584.74
Flash attention 2	Main branch	112126	33.12	3385.46
SDPA	This PR	113254	82.49	1373.00
SDPA	Main branch	113725	170.84	665.67

the performances on main were lowered by Better continuous batching tests #42699 when compared to Add torch compile to CB #42516 but this PR fixes this.

Tests

No new failures for the CB tests. The only two failing tests are the same as in #42699 because of compile not working with gemma2 which seems out of scope and acceptable.

HuggingFaceDocBuilderDev · 2025-12-12T14:59:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

aymeric-roucher · 2025-12-12T19:17:45Z

Hi sir just wanted to say this seems like a nice PR thanks

github-actions · 2025-12-15T09:47:47Z

View the CircleCI Test Summary for this PR:

https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42839&sha=e45e17

remi-or added 7 commits December 12, 2025 13:21

Cb example more args

c5074c6

Remove useless sync

fb3cbe2

Better new tokens, and no more BS1 on outputs

1d1d843

Add dynamic to compile to avoid many graphs

3d7f104

Sort prefix to maximize cache hits

c8497ef

More robust ways to retrieve results in test

cdb2330

Style

4f420d7

Merge branch 'main' into cb6500r

e45e175

remi-or requested a review from ArthurZucker December 15, 2025 09:40

remi-or mentioned this pull request Dec 15, 2025

enable cpu paged cache #42869

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CB] Easy optimizations for continuous batching #42839

[CB] Easy optimizations for continuous batching #42839

remi-or commented Dec 12, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2025

Uh oh!

aymeric-roucher commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[CB] Easy optimizations for continuous batching #42839

Are you sure you want to change the base?

[CB] Easy optimizations for continuous batching #42839

Conversation

remi-or commented Dec 12, 2025

Summary

Performance

Tests

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2025

Uh oh!

aymeric-roucher commented Dec 12, 2025

Uh oh!

github-actions bot commented Dec 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants