Release v1.3.0 · vllm-project/vllm-spyre

This release adds support for Chunked Prefill for non-quantized models that can be enabled with:

VLLM_SPYRE_USE_CHUNKED_PREFILL=1 VLLM_SPYRE_USE_CB=1

What's Changed

feat: chunked prefill spyre model runner by @wallashss in #552
[tests] cleanup: remove temporary hack by @yannicks1 in #555
ChunkedPrefillSpyreScheduler: No Interleaving by @sducouedic in #554
fix: left padding of prompts less than chunk size by @wallashss in #557
feat: left padding from model runner to scheduler by @wallashss in #559
[CP] rewrite scheduler constraints for chunked prefill (🐛 fix) by @yannicks1 in #560
[CB] remove decode/prefill prioritization heuristic by @yannicks1 in #561
Bugfix: padding block cannot be reused with chunked prefill by @sducouedic in #563
[CP] scheduler constraints typo by @yannicks1 in #565
test: add maybe_xfail for quantized micro static batch logprobs checks by @tjohnson31415 in #566
[CP] fix empty model runner output by @yannicks1 in #570
[CB] remove env var VLLM_SPYRE_ENABLE_PREFILL_OPTIMIZATION by @yannicks1 in #562
[CP] optimal chunked prefill scheduler constraints by @yannicks1 in #564
[CP] Simplify code by @maxdebayser in #572
[CB] tighten constraint max model length decode sequences by @yannicks1 in #573
Set default chunk size to 4k for granite 3 8b TP4 by @tjohnson31415 in #571
Interleave chunked prefills with single decoding steps by @sducouedic in #558
feat/fix: add finish_requests to handle removal from ongoing_prefills by @tjohnson31415 in #577
fix: check only decoding requests in _satisfies_last_chunk_constraints by @tjohnson31415 in #576
tests: include chunked prefill on existing tests by @wallashss in #574
Add step tests for chunked prefill by @maxdebayser in #575
docs: chunked prefill updated documentation by @wallashss in #578
[Docs] Prep and publish GH Pages doc by @rafvasq in #579
[Docs] Update GH artifact versions by @rafvasq in #581
[Docs] Add workflow files to docs action triggers by @rafvasq in #582
[Docs] Avoid multiple artifacts by @rafvasq in #583
fix test_compare_graphs_chunked_prefill by @tjohnson31415 in #580
[Docs] Use mkdocs gh-deploy by @rafvasq in #584
[Docs] Update links to documentation by @rafvasq in #587
[PC] Refactor CB model runner to use vLLMs block pool by @maxdebayser in #585
[Docs] Add note about move by @rafvasq in #589
test: a few test configuration updates to have chunked prefill tests pass on Spyre by @tjohnson31415 in #588
Update default granite 8b chunk size to 1024 by @tjohnson31415 in #592
fix time logging and other small things by @yannicks1 in #590

Full Changelog: v1.2.3...v1.3.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.3.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Contributors

Uh oh!