What's Changed
- sync main with dev by @rebel-eunji in #489
- fix: create pr message by @rebel-seinpark in #490
- fix: update uv installation command in README by @rebel-eunji in #495
- other: merge dev into main to update README by @rebel-eunji in #496
- Merge pull request #496 from RBLN-SW/dev by @rebel-eunji in #497
- fix: performance tracker by @rebel-eunji in #503
- fix(NUMA): Cpu split after warmup by @rebel-yskim in #476
- fix(attention): derive num_partition from max_model_len instead of block_length by @rebel-jaehunryu in #513
- fix: change MoE combine by @rebel-ykchoi in #438
- other: bump up v0.18.0 by @rebel-jiwoopark in #514
- fix: num tokens for reorder batch. by @rebel-jiwoopark in #520
- fix: use container-aware thread count to avoid host op performance degradation in pods by @rebel-jonghewk in #512
- other(test): improve worker test coverage by @rebel-jinhwan in #517
- other(tests): tests for bucketing manager by @huijjj in #521
- refactor(scheduler): delay caching instead of undoing by @rebel-jaehwang in #525
- fix(encoder): use RBLNClassifierPooler for classification models by @rebel-jonghewk in #526
- fix: remove unused util func by @rebel-kblee in #530
- fix(config): ensure max_num_batched_tokens >= max_source_positions for enc-dec by @rebel-jonghewk in #527
- fix(core): deduplicate KV cache inputs for torch.export compatibility by @rebel-chanheo in #524
- fix(core): handle meta tensors in KV cache storage key computation by @rebel-chanheo in #533
- fix: skip NUMA cpu affinity on bare metal to prevent 32x latency regression by @rebel-jonghewk in #529
- fix(gemma3): use non-negative sentinel for IMG_PAD_TOKEN_ID by @rebel-jonghewk in #528
- fix(model): fix moe model attribute by @rebel-kblee in #531
- fix: use num_tokens_no_spec in optimum model runner by @rebel-seinpark in #536
- fix(encoder): fix T5EncoderModel scoring mismatch after v0.18.1 bump by @rebel-jonghewk in #532
- feature(pdd): enable P/D disaggregation with NIXL host KV transfer by @rebel-ykchoi in #477
- other: logging argmax for debugging by @rebel-eunji in #540
- feature: add RBLNLMCacheConnectorV1 for LMCache KV cache offloading by @rebel-jinhwan in #523
- other: Revert "other: logging argmax for debugging" by @rebel-eunji in #542
- feature: add sampler benchmark code by @rebel-eunji in #539
- fix(PDD): fixed tensor type for cache copy by @rebel-jindol21 in #550
- fix(whisper): use vLLM block_tables as decoder batch position by @rebel-seinpark in #549
- fix(model): make expert_map with torch export in Moe model by @rebel-kblee in #537
- other: auto-update optimum-rbln to 0.10.3a2 by @rebel-develop in #551
- fix(test): fix torch_compile unit test failures by @rebel-jinhwan in #552
- fix(examples): import get_open_port from vllm.utils.network_utils by @rebel-sjlee in #560
- fix(test): load vllm plugin before test collection by @rebel-jaehwang in #561
- fix(worker): accept profile_prefix arg in profile() by @rebel-sjlee in #559
- fix(model): qwen3 vl with only text input by @rebel-kblee in #557
- fix(model): fix memory estimate logic by @rebel-kblee in #535
- fix: set MKL_NUM_THREADS in OptimumWorker by @rebel-seinpark in #562
- feature(ec_disagg): encoder cache disaggregation for VLMs (NIXL + ZMQ) by @rebel-yskim in #547
- other: auto-update-optimum-rbln-0.10.3a3 by @rebel-seinpark in #567
- fix(worker): use vllm TorchProfilerWrapper in RBLN worker by @rebel-sjlee in #565
- fix(test): assert on n_model_bytes in determine_available_memory tests by @rebel-jinhwan in #568
- core(kernel): rsd support by @rebel-jindol21 in #564
- core(kernel): patch VLLM_RBLN_COMPILE_MODEL for _allocate_kv_cache_tensors by @rebel-jindol21 in #573
- Revert "fix: set MKL_NUM_THREADS in OptimumWorker" by @rebel-seinpark in #574
- other: auto-update optimum-rbln to 0.10.3a4 by @rebel-develop in #576
- core(kernel): change the order of arguments of triton_op along with kernel by @rebel-jindol21 in #577
- other: auto-update optimum-rbln to 0.10.3 by @rebel-develop in #578
- release: v0.13.0 by @rebel-eunji in #579
New Contributors
- @rebel-chanheo made their first contribution in #524
- @rebel-sjlee made their first contribution in #560
Full Changelog: v0.10.2...v0.10.3