Releases: RBLN-SW/vllm-rbln
Releases · RBLN-SW/vllm-rbln
v0.10.2a1
What's Changed
- fix: hotfix for dependencies by @rebel-seinpark in #455
Full Changelog: v0.10.2a0...v0.10.2a1
v0.10.2a0
What's Changed
- other: sync dev with main by @rebel-eunji in #421
- fix(kernel): added sinks as parameters for attention and causal attention by @rebel-jindol21 in #422
- release: v0.10.1post1 by @rebel-eunji in #424
- other: Merge pull request #424 from RBLN-SW/dev by @rebel-eunji in #425
- feature(spec-dec): initial works on ngram and suffix by @huijjj in #408
- fix: add sink argument for swa prefill by @rebel-jaehunryu in #427
- fix: add sink argument for swa prefill by @rebel-eunji in #428
- other: sync with dev by @rebel-eunji in #429
- other: sync main with dev by @rebel-eunji in #432
- other: sync dev with main by @rebel-eunji in #433
- fix: update format of auto-created PR title by @rebel-seinpark in #426
- model: support minimax by @rebel-kblee in #435
- fix: guide what to do on oom by @rebel-jaehwang in #437
- fix: update batch attention logic to handle padding and size conditions by @rebel-jaehunryu in #436
- fix(moe): order of token mask by @rebel-kblee in #441
- fix(context): Quickfix for global context by @rebel-yskim in #444
- fix: enable to turn off RBLN_FORCE_CCL_ASYNC in debugging by @rebel-eunji in #446
- fix: input ref for pr ci by @rebel-seinpark in #447
- other: auto-update optimum-rbln to 0.10.2a0 by @rebel-develop in #452
- feature(context): always use global_ctx by @rebel-yskim in #413
- fix(specdec): remove block size limitation by @junstar92 in #440
- other: replace vLLM GPU with CPU dependency by @rebel-eunji in #439
- fix: set num_threads in TP=1 & DP=1, and change env variable by @rebel-yskim in #453
New Contributors
- @rebel-kblee made their first contribution in #435
Full Changelog: v0.10.1...v0.10.2a0
v0.10.1.post2
What's Changed
- fix: add sink argument for swa prefill by @rebel-jaehunryu in #427
Full Changelog: v0.10.1post1...v0.10.1post2
v0.10.1.post1
What's Changed
- other: sync dev with main by @rebel-eunji in #421
- fix(kernel): added sinks as parameters for attention and causal attention by @rebel-jindol21 in #422
Full Changelog: v0.10.1...v0.10.1.post1
v0.10.1
What's Changed
- chore: pin the version of optimum-rbln by @rebel-eunji in #328
- fix: resolve timeout issue in OpenAI tests by @rebel-eunji in #329
- bump: initial vllm v0.13 bump up by @huijjj in #336
- core: bump vllm to 0.13.0 by @rebel-eunji in #298
- add: pytorch native rejection sampler by @huijjj in #341
- chore: update pre-commit on dev-0.13 by @rebel-eunji in #345
- chore: add .git-blame-ignore-revs by @rebel-eunji in #355
- fix: update imports to reflect package path/name changes by @junstar92 in #356
- fix(test): common unit test by @rebel-jiwoopark in #357
- update: enable kv cache meta tensor only when it is not a user custom kernel by @rebel-jiwoopark in #358
- fix: eagle/eagle3 for v0.13.0 by @junstar92 in #346
- chore: bump dev branch to dev-0.13 by @rebel-eunji in #361
- fix(model): skip computing encoder budget by @rebel-eunji in #368
- feat(kernel): support normal and causal normal kernel in mlir by @rebel-jindol21 in #360
- test(ci): add retry wrapper by @rebel-jaebin in #359
- fix(kernel): added missing kernel decl by @rebel-jindol21 in #374
- fix: adapt to 0.13 FusedMoEMethod signature by @rebel-jaehwang in #369
- fix: multimodal processing vllm 0.13.0 and sampler by @rebel-eunji in #378
- fix(platform): not support disabling chunked prefill by @rebel-jiwoopark in #366
- changed an order of arguments by @rebel-jindol21 in #372
- refactor: Remove useless workflow by @rebel-jaebin in #385
- feature(kernel): modify user kernel mode option by @rebel-jiwoopark in #383
- feature(attn): remove sinks attn param for sliding window attn. by @rebel-jiwoopark in #390
- other(ci): add PR title checker by @rebel-seinpark in #387
- fix(ci): fix pr title checker by @rebel-jaehwang in #392
- fix: handle 0-sized logits by @rebel-jaehwang in #381
- fix: change the pooling type of encoder model to CLS by @rebel-eunji in #379
- fix(model): fix Qwen3 embedding support in vLLM 0.13.0 by @rebel-eunji in #380
- fix: reshape logits to avoid stride-triggerd recompilation by @rebel-eunji in #364
- fix: correct sampler output under CPU eager fallback of torch.compile by @rebel-eunji in #382
- feature: merge dev-0.12-rebase into dev by @rebel-eunji in #395
- feature(ccl): Enable autoport in rbln-ccl by @rebel-yskim in #397
- fix(core): Sampler with
RBLN_CTX_STANDALONEby @rebel-jonghewk in #401 - core(platform): check prerequisite for parallelism by @rebel-jiwoopark in #393
- fix(attn): attn backend related to sinks by @rebel-jiwoopark in #403
- other: Revert "fix(core): Sampler with
RBLN_CTX_STANDALONE" by @rebel-eunji in #405 - fix: set RBLN_CTX_STANDALONE to False by @rebel-eunji in #406
- refactor: fix available memory estimation by @rebel-ykchoi in #404
- fix: Re-enable standalone ctx by @rebel-jonghewk in #407
- other: Auto-update optimum-rbln to 0.10.1a2 by @rebel-develop in #412
- fix: reshape logits only in prefill phase by @rebel-eunji in #409
- fix: set num_blocks outside of model_runner by @rebel-eunji in #410
- fix: add sink argument for contrib.custom_ops by @rebel-jaehunryu in #416
- other: Auto-update optimum-rbln to 0.10.1 by @rebel-develop in #419
- release: v0.10.1 by @rebel-seinpark in #420
Full Changelog: v0.10.0...v0.10.1
v0.10.1a0
What's Changed
- chore: pin the version of optimum-rbln by @rebel-eunji in #328
- fix: resolve timeout issue in OpenAI tests by @rebel-eunji in #329
- bump: initial vllm v0.13 bump up by @huijjj in #336
- core: bump vllm to 0.13.0 by @rebel-eunji in #298
- add: pytorch native rejection sampler by @huijjj in #341
- chore: update pre-commit on dev-0.13 by @rebel-eunji in #345
- chore: add .git-blame-ignore-revs by @rebel-eunji in #355
- fix: update imports to reflect package path/name changes by @junstar92 in #356
- fix(test): common unit test by @rebel-jiwoopark in #357
- update: enable kv cache meta tensor only when it is not a user custom kernel by @rebel-jiwoopark in #358
- fix: eagle/eagle3 for v0.13.0 by @junstar92 in #346
- chore: bump dev branch to dev-0.13 by @rebel-eunji in #361
- fix(model): skip computing encoder budget by @rebel-eunji in #368
Full Changelog: v0.10.0...v0.10.1a0
v0.10.0
What's Changed
- fix: add guard_filter to rbln_sampler by @rebel-eunji in #241
- fix environment variables for rbln ccl by @rebel-ykchoi in #242
- other(ci): change the runner of PR CI by @rebel-eunji in #245
- Revert "fix: add guard_filter to rbln_sampler" by @rebel-eunji in #251
- update: Change log level in CI by @rebel-jonghewk in #250
- other(ci): log the length of generated tokens when the request is preempted by @rebel-eunji in #248
- fix(model): fix whisper model by @rebel-eunji in #247
- other(CI): clean up workflow by @rebel-seinpark in #233
- update(core): customize blockpool to increase prefix cache hit rate by @rebel-eunji in #205
- fix(ci): pkg auto update by @rebel-seinpark in #259
- fix(ci): pkg update by @rebel-seinpark in #260
- other: add enable_expert_parallel in basic example by @rebel-jiwoopark in #255
- Auto-update optimum-rbln to 0.9.5a0 by @rebel-shshin in #261
- update(worker): Set NUMA aware CPU affinity and OMP_NUM_THREADS by @rebel-yskim in #232
- other(ci): fix bug in sampler and update pytest for sampler by @rebel-eunji in #239
- Add dev0.12 event type by @rebel-jaebin in #267
- Auto-update optimum-rbln to 0.9.5a1 by @rebel-shshin in #268
- update: enable profiler for optimum-rbln based vllm by @rebel-jonghewk in #256
- fix: warm up for swa hybrid model by @rebel-jaehwang in #231
- other(ci): remove explicit secrets by @rebel-seinpark in #276
- Auto-update optimum-rbln to 0.9.5a2 by @rebel-shshin in #275
- Update runner name by @rebel-jaebin in #281
- other: repo migration (sw) by @rebel-seinpark in #271
- Auto-update optimum-rbln to 0.9.5a4 by @rebel-develop in #282
- other(test): add unit test for attention backend. by @rebel-jiwoopark in #273
- Revert "other(test): add unit test for attention backend." by @rebel-jiwoopark in #285
- Auto-update optimum-rbln to 0.9.5a5 by @rebel-develop in #286
- update(triton): replace triton with torch_triton by @rebel-jindol21 in #274
- fix moe data parallel for v1 engine by @rebel-ykchoi in #252
- Auto-update optimum-rbln to 0.9.5a6 by @rebel-develop in #288
- Revert "update(triton): replace triton with torch_triton" by @rebel-jindol21 in #289
- feat: enable topk_topp_sampler by @rebel-eunji in #284
- feat: prefill performance by request_id and exclude warmup requests from performance tracking by @rebel-eunji in #279
- feat(triton): support torch_triton by @rebel-jindol21 in #292
- fix(model): pad image tokens using an out-of-vocab index by @rebel-eunji in #291
- Auto-update optimum-rbln to 0.9.5a7 by @rebel-develop in #296
- fix: use rbln_sampler when both top_k and top_p are None by @rebel-eunji in #295
- update(triton): fixed along with triton kernels in rebel_compiler by @rebel-jindol21 in #299
- fix: apply argmax to greedy by @rebel-eunji in #300
- fix(kernel): fix kvcache by @rebel-jindol21 in #301
- Auto-update optimum-rbln to 0.9.5a8 by @rebel-develop in #302
- fix(triton): alignment enforced by @rebel-jindol21 in #303
- fix(attn): remove invalid attention param by @rebel-jiwoopark in #305
- sync main-dev by @rebel-eunji in #314
- Auto-update optimum-rbln to 0.10.0.post1 by @rebel-develop in #315
- Release v0.10.0 by @rebel-eunji in #318
New Contributors
- @rebel-yskim made their first contribution in #232
- @rebel-develop made their first contribution in #282
Full Changelog: v0.9.4...v0.10.0
v0.9.4
What's Changed
- fix(core): async engine bug by @rebel-jiwoopark in #140
- feat(core): add structured output support and add benchmark scripts by @pei0033 in #121
- fix(bug): Fix the bug of recompile_limit when compiling sampler by @rebel-eunji in #153
- Batched rope by @rebel-jaehwang in #150
- feat(model): Support for Structured Ouput by @rebel-eunji in #151
- fix(core): customize the KV Cache Manager for prefix caching by @rebel-eunji in #146
- fix(core): Fix the logic of dummy block selection for padding in case of prefix caching by @rebel-eunji in #154
- fix(core): exclude cache-hit blocks when checking free block availability by @rebel-eunji in #158
- trivial: remove hardcoded strict compilation by @huijjj in #161
- fix(core): skip touch function of cached blocks by @rebel-eunji in #164
- ci: add GitHub Actions workflow for ARC CI testing by @rebel-jaebin in #120
- refactor: clean up input padding logic by @rebel-jaehwang in #168
- dev revert 251124 by @rebel-sunwook in #170
- feat: mixed precision quantization by @rebel-jaehwang in #159
- fix(core): fix debug logging in case of abortion by @rebel-eunji in #173
- ci: Add internal ci trigger workflow by @rebel-jaebin in #172
- ci: add basic model test. by @rebel-jiwoopark in #165
- chore(deps): bump optimum-rbln package from 0.9.2.a7 to 0.9.3(stable) by @rebel-eunji in #149
- update MoE PoC features & bfloat16 model load by @rebel-wonsubkim in #145
- add(CI): model coverage on single device by @huijjj in #178
- feat: sliding window attention by @rebel-jaehwang in #167
- feat(model): prevent excessive recompilation of sampler by @rebel-eunji in #171
- fix(core): fix handling logprobs when use rbln_sampler by @rebel-eunji in #184
- [feat] LoRA support on V1 (with limitations) by @junstar92 in #169
- feat(model): fix type casting and enable bfloat16 optimum-rbln compiled models by @rebel-eunji in #183
- fix(core): v1 scheduler by @huijjj in #181
- update(ci): add openai server test in PR CI by @rebel-eunji in #182
- fix(core): fix sampler buckets to include max_num_seqs by @rebel-eunji in #186
- chore: bump version of optimum-rbln to 0.9.4a0 by @rebel-eunji in #189
- other: cache rbln_sampler by @rebel-eunji in #187
- fix(model): fix gemma3 to use left-padding by @rebel-eunji in #190
- other: use standard cache dir by @rebel-jaehwang in #188
- other(v0): log model loading & compilation time by @rebel-jaebin in #193
- other(ci): add common unit tests. by @rebel-jiwoopark in #195
- Moe PoC v1 migration by @rebel-wonsubkim in #180
- fix(other): disable autograd in logits by wrapping with torch.inference_mode by @rebel-eunji in #194
- Restore revert 251124 by @rebel-jaebin in #174
- fix: cleanup installation by @rebel-sunwook in #198
- fix: python 3.9 compat by @rebel-jaehwang in #200
- feat(other): unify the contexts in sampler by @rebel-eunji in #197
- feat(model): add tokens mask for MoE custom kernel by @rebel-ykchoi in #192
- feat(core): add pooling model initial support for V1 engine by @pei0033 in #152
- other(ci): add pytest-cov by @rebel-jiwoopark in #203
- update vllm moe kernel by @rebel-jaehunryu in #202
- test: add comprehensive LoRA tests by @junstar92 in #199
- fix(model): fix custom_moe_glu custom operation signature by @rebel-jaehunryu in #204
- feat(model): add paligemma, paligemma2 by @rebel-eunji in #162
- other: log format by @rebel-jaehwang in #207
- feat(core): allow dynamic block_size for prefix caching by @rebel-eunji in #185
- Revive requirements txt by @rebel-sunwook in #210
- feat(model): support hybrid attention text-only models in optimum (gpt-oss, gemma2) by @rebel-eunji in #208
- fix(script): DP placeholder condition by @rebel-myeongbo in #201
- feat(ci): add paligemma to optimum ci by @rebel-seinpark in #211
- other(ci): add core pytest by @rebel-jiwoopark in #212
- fix: missing sinks attention param by @rebel-jiwoopark in #213
- chore: bump version of optimum-rbln up to 0.9.4rc0 by @rebel-seinpark in #218
- fix: get dtype from optimum model by @rebel-seinpark in #217
- fix(env): change default value of VLLM_RBLN_MOE_USE_OPT_KERNEL to True by @rebel-jaehunryu in #220
- fix: modify physical device ids for v1 dp rsd by @rebel-wonsubkim in #214
- feat(ci): auto update for optimum-rbln by @rebel-seinpark in #222
- fix(core): Apply
keep_tensor_guards_unsafefor torch.compile by @rebel-jonghewk in #229 - fix: block caching logics to enable prefix caching by @huijjj in #228
- fix(sampler): Disable sampler cache by @rebel-jonghewk in #234
- fix: Resolve dev main merge conflict by @rebel-jonghewk in #236
- Release v0.9.4 by @rebel-shshin in #235
- fix(packaging): revive setup.py by @rebel-sunwook in #237
- fix(CI): revive setup.py by @rebel-shshin in #238
New Contributors
- @rebel-jaebin made their first contribution in #120
- @rebel-ykchoi made their first contribution in #192
- @rebel-jaehunryu made their first contribution in #202
- @rebel-myeongbo made their first contribution in #201
Full Changelog: v0.9.3...v0.9.4
v0.9.3.post2
Full Changelog: v0.9.3.post1...v0.9.3.post2
v0.9.3.post1
Full Changelog: v0.9.3...v0.9.3.post1