Releases · RBLN-SW/vllm-rbln

other: sync dev with main by @rebel-eunji in #421
fix(kernel): added sinks as parameters for attention and causal attention by @rebel-jindol21 in #422
release: v0.10.1post1 by @rebel-eunji in #424
other: Merge pull request #424 from RBLN-SW/dev by @rebel-eunji in #425
feature(spec-dec): initial works on ngram and suffix by @huijjj in #408
fix: add sink argument for swa prefill by @rebel-jaehunryu in #427
fix: add sink argument for swa prefill by @rebel-eunji in #428
other: sync with dev by @rebel-eunji in #429
other: sync main with dev by @rebel-eunji in #432
other: sync dev with main by @rebel-eunji in #433
fix: update format of auto-created PR title by @rebel-seinpark in #426
model: support minimax by @rebel-kblee in #435
fix: guide what to do on oom by @rebel-jaehwang in #437
fix: update batch attention logic to handle padding and size conditions by @rebel-jaehunryu in #436
fix(moe): order of token mask by @rebel-kblee in #441
fix(context): Quickfix for global context by @rebel-yskim in #444
fix: enable to turn off RBLN_FORCE_CCL_ASYNC in debugging by @rebel-eunji in #446
fix: input ref for pr ci by @rebel-seinpark in #447
other: auto-update optimum-rbln to 0.10.2a0 by @rebel-develop in #452
feature(context): always use global_ctx by @rebel-yskim in #413
fix(specdec): remove block size limitation by @junstar92 in #440
other: replace vLLM GPU with CPU dependency by @rebel-eunji in #439
fix: set num_threads in TP=1 & DP=1, and change env variable by @rebel-yskim in #453

New Contributors

@rebel-kblee made their first contribution in #435

Full Changelog: v0.10.1...v0.10.2a0

Contributors

huijjj, junstar92, and 8 other contributors

Assets 2

04 Mar 12:51

rebel-eunji

v0.10.1.post2

f0e0c2b

v0.10.1.post2 Latest

Latest

What's Changed

fix: add sink argument for swa prefill by @rebel-jaehunryu in #427

Full Changelog: v0.10.1post1...v0.10.1post2

Contributors

rebel-jaehunryu

Assets 2

04 Mar 12:49

rebel-eunji

v0.10.1.post1

15d9116

v0.10.1.post1

What's Changed

other: sync dev with main by @rebel-eunji in #421
fix(kernel): added sinks as parameters for attention and causal attention by @rebel-jindol21 in #422

Full Changelog: v0.10.1...v0.10.1.post1

Contributors

rebel-jindol21 and rebel-eunji

Assets 2

27 Feb 08:54

rebel-seinpark

v0.10.1

e9ff4c1

v0.10.1

What's Changed

chore: pin the version of optimum-rbln by @rebel-eunji in #328
fix: resolve timeout issue in OpenAI tests by @rebel-eunji in #329
bump: initial vllm v0.13 bump up by @huijjj in #336
core: bump vllm to 0.13.0 by @rebel-eunji in #298
add: pytorch native rejection sampler by @huijjj in #341
chore: update pre-commit on dev-0.13 by @rebel-eunji in #345
chore: add .git-blame-ignore-revs by @rebel-eunji in #355
fix: update imports to reflect package path/name changes by @junstar92 in #356
fix(test): common unit test by @rebel-jiwoopark in #357
update: enable kv cache meta tensor only when it is not a user custom kernel by @rebel-jiwoopark in #358
fix: eagle/eagle3 for v0.13.0 by @junstar92 in #346
chore: bump dev branch to dev-0.13 by @rebel-eunji in #361
fix(model): skip computing encoder budget by @rebel-eunji in #368
feat(kernel): support normal and causal normal kernel in mlir by @rebel-jindol21 in #360
test(ci): add retry wrapper by @rebel-jaebin in #359
fix(kernel): added missing kernel decl by @rebel-jindol21 in #374
fix: adapt to 0.13 FusedMoEMethod signature by @rebel-jaehwang in #369
fix: multimodal processing vllm 0.13.0 and sampler by @rebel-eunji in #378
fix(platform): not support disabling chunked prefill by @rebel-jiwoopark in #366
changed an order of arguments by @rebel-jindol21 in #372
refactor: Remove useless workflow by @rebel-jaebin in #385
feature(kernel): modify user kernel mode option by @rebel-jiwoopark in #383
feature(attn): remove sinks attn param for sliding window attn. by @rebel-jiwoopark in #390
other(ci): add PR title checker by @rebel-seinpark in #387
fix(ci): fix pr title checker by @rebel-jaehwang in #392
fix: handle 0-sized logits by @rebel-jaehwang in #381
fix: change the pooling type of encoder model to CLS by @rebel-eunji in #379
fix(model): fix Qwen3 embedding support in vLLM 0.13.0 by @rebel-eunji in #380
fix: reshape logits to avoid stride-triggerd recompilation by @rebel-eunji in #364
fix: correct sampler output under CPU eager fallback of torch.compile by @rebel-eunji in #382
feature: merge dev-0.12-rebase into dev by @rebel-eunji in #395
feature(ccl): Enable autoport in rbln-ccl by @rebel-yskim in #397
fix(core): Sampler with RBLN_CTX_STANDALONE by @rebel-jonghewk in #401
core(platform): check prerequisite for parallelism by @rebel-jiwoopark in #393
fix(attn): attn backend related to sinks by @rebel-jiwoopark in #403
other: Revert "fix(core): Sampler with RBLN_CTX_STANDALONE" by @rebel-eunji in #405
fix: set RBLN_CTX_STANDALONE to False by @rebel-eunji in #406
refactor: fix available memory estimation by @rebel-ykchoi in #404
fix: Re-enable standalone ctx by @rebel-jonghewk in #407
other: Auto-update optimum-rbln to 0.10.1a2 by @rebel-develop in #412
fix: reshape logits only in prefill phase by @rebel-eunji in #409
fix: set num_blocks outside of model_runner by @rebel-eunji in #410
fix: add sink argument for contrib.custom_ops by @rebel-jaehunryu in #416
other: Auto-update optimum-rbln to 0.10.1 by @rebel-develop in #419
release: v0.10.1 by @rebel-seinpark in #420

Full Changelog: v0.10.0...v0.10.1

Contributors

huijjj, junstar92, and 11 other contributors

Assets 2

12 Feb 01:54

rebel-junamsong

v0.10.1a0

5c3b362

v0.10.1a0 Pre-release

Pre-release

What's Changed

chore: pin the version of optimum-rbln by @rebel-eunji in #328
fix: resolve timeout issue in OpenAI tests by @rebel-eunji in #329
bump: initial vllm v0.13 bump up by @huijjj in #336
core: bump vllm to 0.13.0 by @rebel-eunji in #298
add: pytorch native rejection sampler by @huijjj in #341
chore: update pre-commit on dev-0.13 by @rebel-eunji in #345
chore: add .git-blame-ignore-revs by @rebel-eunji in #355
fix: update imports to reflect package path/name changes by @junstar92 in #356
fix(test): common unit test by @rebel-jiwoopark in #357
update: enable kv cache meta tensor only when it is not a user custom kernel by @rebel-jiwoopark in #358
fix: eagle/eagle3 for v0.13.0 by @junstar92 in #346
chore: bump dev branch to dev-0.13 by @rebel-eunji in #361
fix(model): skip computing encoder budget by @rebel-eunji in #368

Full Changelog: v0.10.0...v0.10.1a0

Contributors

huijjj, junstar92, and 2 other contributors

Assets 2

30 Jan 09:07

rebel-eunji

v0.10.0

1964f51

v0.10.0

What's Changed

fix: add guard_filter to rbln_sampler by @rebel-eunji in #241
fix environment variables for rbln ccl by @rebel-ykchoi in #242
other(ci): change the runner of PR CI by @rebel-eunji in #245
Revert "fix: add guard_filter to rbln_sampler" by @rebel-eunji in #251
update: Change log level in CI by @rebel-jonghewk in #250
other(ci): log the length of generated tokens when the request is preempted by @rebel-eunji in #248
fix(model): fix whisper model by @rebel-eunji in #247
other(CI): clean up workflow by @rebel-seinpark in #233
update(core): customize blockpool to increase prefix cache hit rate by @rebel-eunji in #205
fix(ci): pkg auto update by @rebel-seinpark in #259
fix(ci): pkg update by @rebel-seinpark in #260
other: add enable_expert_parallel in basic example by @rebel-jiwoopark in #255
Auto-update optimum-rbln to 0.9.5a0 by @rebel-shshin in #261
update(worker): Set NUMA aware CPU affinity and OMP_NUM_THREADS by @rebel-yskim in #232
other(ci): fix bug in sampler and update pytest for sampler by @rebel-eunji in #239
Add dev0.12 event type by @rebel-jaebin in #267
Auto-update optimum-rbln to 0.9.5a1 by @rebel-shshin in #268
update: enable profiler for optimum-rbln based vllm by @rebel-jonghewk in #256
fix: warm up for swa hybrid model by @rebel-jaehwang in #231
other(ci): remove explicit secrets by @rebel-seinpark in #276
Auto-update optimum-rbln to 0.9.5a2 by @rebel-shshin in #275
Update runner name by @rebel-jaebin in #281
other: repo migration (sw) by @rebel-seinpark in #271
Auto-update optimum-rbln to 0.9.5a4 by @rebel-develop in #282
other(test): add unit test for attention backend. by @rebel-jiwoopark in #273
Revert "other(test): add unit test for attention backend." by @rebel-jiwoopark in #285
Auto-update optimum-rbln to 0.9.5a5 by @rebel-develop in #286
update(triton): replace triton with torch_triton by @rebel-jindol21 in #274
fix moe data parallel for v1 engine by @rebel-ykchoi in #252
Auto-update optimum-rbln to 0.9.5a6 by @rebel-develop in #288
Revert "update(triton): replace triton with torch_triton" by @rebel-jindol21 in #289
feat: enable topk_topp_sampler by @rebel-eunji in #284
feat: prefill performance by request_id and exclude warmup requests from performance tracking by @rebel-eunji in #279
feat(triton): support torch_triton by @rebel-jindol21 in #292
fix(model): pad image tokens using an out-of-vocab index by @rebel-eunji in #291
Auto-update optimum-rbln to 0.9.5a7 by @rebel-develop in #296
fix: use rbln_sampler when both top_k and top_p are None by @rebel-eunji in #295
update(triton): fixed along with triton kernels in rebel_compiler by @rebel-jindol21 in #299
fix: apply argmax to greedy by @rebel-eunji in #300
fix(kernel): fix kvcache by @rebel-jindol21 in #301
Auto-update optimum-rbln to 0.9.5a8 by @rebel-develop in #302
fix(triton): alignment enforced by @rebel-jindol21 in #303
fix(attn): remove invalid attention param by @rebel-jiwoopark in #305
sync main-dev by @rebel-eunji in #314
Auto-update optimum-rbln to 0.10.0.post1 by @rebel-develop in #315
Release v0.10.0 by @rebel-eunji in #318

New Contributors

@rebel-yskim made their first contribution in #232
@rebel-develop made their first contribution in #282

Full Changelog: v0.9.4...v0.10.0

Contributors

rebel-shshin, rebel-develop, and 9 other contributors

Assets 2

26 Dec 05:18

rebel-shshin

v0.9.4

388e810

v0.9.4

What's Changed

fix(core): async engine bug by @rebel-jiwoopark in #140
feat(core): add structured output support and add benchmark scripts by @pei0033 in #121
fix(bug): Fix the bug of recompile_limit when compiling sampler by @rebel-eunji in #153
Batched rope by @rebel-jaehwang in #150
feat(model): Support for Structured Ouput by @rebel-eunji in #151
fix(core): customize the KV Cache Manager for prefix caching by @rebel-eunji in #146
fix(core): Fix the logic of dummy block selection for padding in case of prefix caching by @rebel-eunji in #154
fix(core): exclude cache-hit blocks when checking free block availability by @rebel-eunji in #158
trivial: remove hardcoded strict compilation by @huijjj in #161
fix(core): skip touch function of cached blocks by @rebel-eunji in #164
ci: add GitHub Actions workflow for ARC CI testing by @rebel-jaebin in #120
refactor: clean up input padding logic by @rebel-jaehwang in #168
dev revert 251124 by @rebel-sunwook in #170
feat: mixed precision quantization by @rebel-jaehwang in #159
fix(core): fix debug logging in case of abortion by @rebel-eunji in #173
ci: Add internal ci trigger workflow by @rebel-jaebin in #172
ci: add basic model test. by @rebel-jiwoopark in #165
chore(deps): bump optimum-rbln package from 0.9.2.a7 to 0.9.3(stable) by @rebel-eunji in #149
update MoE PoC features & bfloat16 model load by @rebel-wonsubkim in #145
add(CI): model coverage on single device by @huijjj in #178
feat: sliding window attention by @rebel-jaehwang in #167
feat(model): prevent excessive recompilation of sampler by @rebel-eunji in #171
fix(core): fix handling logprobs when use rbln_sampler by @rebel-eunji in #184
[feat] LoRA support on V1 (with limitations) by @junstar92 in #169
feat(model): fix type casting and enable bfloat16 optimum-rbln compiled models by @rebel-eunji in #183
fix(core): v1 scheduler by @huijjj in #181
update(ci): add openai server test in PR CI by @rebel-eunji in #182
fix(core): fix sampler buckets to include max_num_seqs by @rebel-eunji in #186
chore: bump version of optimum-rbln to 0.9.4a0 by @rebel-eunji in #189
other: cache rbln_sampler by @rebel-eunji in #187
fix(model): fix gemma3 to use left-padding by @rebel-eunji in #190
other: use standard cache dir by @rebel-jaehwang in #188
other(v0): log model loading & compilation time by @rebel-jaebin in #193
other(ci): add common unit tests. by @rebel-jiwoopark in #195
Moe PoC v1 migration by @rebel-wonsubkim in #180
fix(other): disable autograd in logits by wrapping with torch.inference_mode by @rebel-eunji in #194
Restore revert 251124 by @rebel-jaebin in #174
fix: cleanup installation by @rebel-sunwook in #198
fix: python 3.9 compat by @rebel-jaehwang in #200
feat(other): unify the contexts in sampler by @rebel-eunji in #197
feat(model): add tokens mask for MoE custom kernel by @rebel-ykchoi in #192
feat(core): add pooling model initial support for V1 engine by @pei0033 in #152
other(ci): add pytest-cov by @rebel-jiwoopark in #203
update vllm moe kernel by @rebel-jaehunryu in #202
test: add comprehensive LoRA tests by @junstar92 in #199
fix(model): fix custom_moe_glu custom operation signature by @rebel-jaehunryu in #204
feat(model): add paligemma, paligemma2 by @rebel-eunji in #162
other: log format by @rebel-jaehwang in #207
feat(core): allow dynamic block_size for prefix caching by @rebel-eunji in #185
Revive requirements txt by @rebel-sunwook in #210
feat(model): support hybrid attention text-only models in optimum (gpt-oss, gemma2) by @rebel-eunji in #208
fix(script): DP placeholder condition by @rebel-myeongbo in #201
feat(ci): add paligemma to optimum ci by @rebel-seinpark in #211
other(ci): add core pytest by @rebel-jiwoopark in #212
fix: missing sinks attention param by @rebel-jiwoopark in #213
chore: bump version of optimum-rbln up to 0.9.4rc0 by @rebel-seinpark in #218
fix: get dtype from optimum model by @rebel-seinpark in #217
fix(env): change default value of VLLM_RBLN_MOE_USE_OPT_KERNEL to True by @rebel-jaehunryu in #220
fix: modify physical device ids for v1 dp rsd by @rebel-wonsubkim in #214
feat(ci): auto update for optimum-rbln by @rebel-seinpark in #222
fix(core): Apply keep_tensor_guards_unsafe for torch.compile by @rebel-jonghewk in #229
fix: block caching logics to enable prefix caching by @huijjj in #228
fix(sampler): Disable sampler cache by @rebel-jonghewk in #234
fix: Resolve dev main merge conflict by @rebel-jonghewk in #236
Release v0.9.4 by @rebel-shshin in #235
fix(packaging): revive setup.py by @rebel-sunwook in #237
fix(CI): revive setup.py by @rebel-shshin in #238