Release v0.10.3 · RBLN-SW/vllm-rbln

What's Changed

sync main with dev by @rebel-eunji in #489
fix: create pr message by @rebel-seinpark in #490
fix: update uv installation command in README by @rebel-eunji in #495
other: merge dev into main to update README by @rebel-eunji in #496
Merge pull request #496 from RBLN-SW/dev by @rebel-eunji in #497
fix: performance tracker by @rebel-eunji in #503
fix(NUMA): Cpu split after warmup by @rebel-yskim in #476
fix(attention): derive num_partition from max_model_len instead of block_length by @rebel-jaehunryu in #513
fix: change MoE combine by @rebel-ykchoi in #438
other: bump up v0.18.0 by @rebel-jiwoopark in #514
fix: num tokens for reorder batch. by @rebel-jiwoopark in #520
fix: use container-aware thread count to avoid host op performance degradation in pods by @rebel-jonghewk in #512
other(test): improve worker test coverage by @rebel-jinhwan in #517
other(tests): tests for bucketing manager by @huijjj in #521
refactor(scheduler): delay caching instead of undoing by @rebel-jaehwang in #525
fix(encoder): use RBLNClassifierPooler for classification models by @rebel-jonghewk in #526
fix: remove unused util func by @rebel-kblee in #530
fix(config): ensure max_num_batched_tokens >= max_source_positions for enc-dec by @rebel-jonghewk in #527
fix(core): deduplicate KV cache inputs for torch.export compatibility by @rebel-chanheo in #524
fix(core): handle meta tensors in KV cache storage key computation by @rebel-chanheo in #533
fix: skip NUMA cpu affinity on bare metal to prevent 32x latency regression by @rebel-jonghewk in #529
fix(gemma3): use non-negative sentinel for IMG_PAD_TOKEN_ID by @rebel-jonghewk in #528
fix(model): fix moe model attribute by @rebel-kblee in #531
fix: use num_tokens_no_spec in optimum model runner by @rebel-seinpark in #536
fix(encoder): fix T5EncoderModel scoring mismatch after v0.18.1 bump by @rebel-jonghewk in #532
feature(pdd): enable P/D disaggregation with NIXL host KV transfer by @rebel-ykchoi in #477
other: logging argmax for debugging by @rebel-eunji in #540
feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading by @rebel-jinhwan in #523
other: Revert "other: logging argmax for debugging" by @rebel-eunji in #542
feature: add sampler benchmark code by @rebel-eunji in #539
fix(PDD): fixed tensor type for cache copy by @rebel-jindol21 in #550
fix(whisper): use vLLM block_tables as decoder batch position by @rebel-seinpark in #549
fix(model): make expert_map with torch export in Moe model by @rebel-kblee in #537
other: auto-update optimum-rbln to 0.10.3a2 by @rebel-develop in #551
fix(test): fix torch_compile unit test failures by @rebel-jinhwan in #552
fix(examples): import get_open_port from vllm.utils.network_utils by @rebel-sjlee in #560
fix(test): load vllm plugin before test collection by @rebel-jaehwang in #561
fix(worker): accept profile_prefix arg in profile() by @rebel-sjlee in #559
fix(model): qwen3 vl with only text input by @rebel-kblee in #557
fix(model): fix memory estimate logic by @rebel-kblee in #535
fix: set MKL_NUM_THREADS in OptimumWorker by @rebel-seinpark in #562
feature(ec_disagg): encoder cache disaggregation for VLMs (NIXL + ZMQ) by @rebel-yskim in #547
other: auto-update-optimum-rbln-0.10.3a3 by @rebel-seinpark in #567
fix(worker): use vllm TorchProfilerWrapper in RBLN worker by @rebel-sjlee in #565
fix(test): assert on n_model_bytes in determine_available_memory tests by @rebel-jinhwan in #568
core(kernel): rsd support by @rebel-jindol21 in #564
core(kernel): patch VLLM_RBLN_COMPILE_MODEL for _allocate_kv_cache_tensors by @rebel-jindol21 in #573
Revert "fix: set MKL_NUM_THREADS in OptimumWorker" by @rebel-seinpark in #574
other: auto-update optimum-rbln to 0.10.3a4 by @rebel-develop in #576
core(kernel): change the order of arguments of triton_op along with kernel by @rebel-jindol21 in #577
other: auto-update optimum-rbln to 0.10.3 by @rebel-develop in #578
release: v0.13.0 by @rebel-eunji in #579

New Contributors

@rebel-chanheo made their first contribution in #524
@rebel-sjlee made their first contribution in #560

Full Changelog: v0.10.2...v0.10.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.10.3

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!