Releases: rebellions-sw/vllm-rbln
Releases · rebellions-sw/vllm-rbln
v0.9.2.post1
What's Changed
- fix(version): Release v0.9.2.post1 release by @rebel-shshin in #141
New Contributors
- @rebel-shshin made their first contribution in #141
Full Changelog: v0.9.2...v0.9.2.post1
What's Changed
- fix(ci) : check collaborator logic by @rebel-seinpark in #124
- feat: Support for Sort-free sampling by @rebel-eunji in #107
- fix(CI): fix output sync code in pr dispatch workflow by @rebel-jswoo in #125
- update: Bump version to v0.10.2 (optimum) by @rebel-eunji in #122
- fix(model): Fix pooling models in v0.10.2 by @rebel-eunji in #129
- other: Update the python version to 3.12 by @rebel-eunji in #131
- add(core): Support for Prefix Caching by @rebel-eunji in #85
- fix(model): pre-release bug fixes by @rebel-eunji in #133
- fix(model): fix whisper, qwen3-embedding/reranker, qwen2.5-vl model by @rebel-eunji in #134
- fix: clarify the semantics of attention custom ops by @rebel-jaehwang in #117
- fix(model): fix multi-modal processing and classification model by @rebel-eunji in #135
- update: Modify code to be compatible with v0.10.2 by @rebel-jiwoopark in #119
- fix(core): resolve prefix caching issue during preemption by @rebel-eunji in #136
- fix(other): Disable prefix caching for stability by @rebel-eunji in #137
- fix(other): turn off the prefix caching completely by @rebel-eunji in #138
- fix(other): fix pytest bugs by @rebel-eunji in #139
- fix(version): Release v0.9.2.post1 release by @rebel-shshin in #141
- fix(version): update release version to v0.9.2.post1 by @rebel-shshin in #142
New Contributors
- @rebel-shshin made their first contribution in #141
Full Changelog: v0.9.2...v0.9.2.post1
v0.9.2
What's Changed
- feat(core): Enable causal attention kernel by @rebel-jiwoopark in #95
- fix: TP in V1 engine by @rebel-jiwoopark in #101
- other: Update the version of transformers to 4.53.1 by @rebel-eunji in #102
- fix(CI): add outer collaborator checker by @rebel-jswoo in #100
- other: add team members by @rebel-seinpark in #103
- feat(core): support for Multi-LoRA by @rebel-eunji in #48
- refactor: update variable names and logging for better code clarity by @pei0033 in #105
- other: Upgrade the version of optimum-rbln by @rebel-eunji in #109
- other: Script for logprobs validation by @rebel-eunji in #108
- fix(CI): Add cleanup job in dispatch pr ci by @rebel-jswoo in #104
- fix: checking is_collaborator logic in CI by @rebel-seinpark in #110
- Run patches in RblnPlatform.pre_register_and_update by @rebel-jaehwang in #43
- feat: compile lm_head using rbln backend (#96) by @rebel-jiwoopark in #99
- fix(core): timeout error of large models. by @rebel-jiwoopark in #111
- bug-fix: Skip using compile_context when
RBLN_COMPILE_MODEL=0 (V0) by @rebel-eunji in #116 - add: initial works for enabling warmup in v1 engine by @huijjj in #84
- other: change PR trigger to dev by @rebel-jonghewk in #123
Full Changelog: v0.9.1...v0.9.2
v0.9.2a1
What's Changed
- refactor: update variable names and logging for better code clarity by @pei0033 in #105
- other: Upgrade the version of optimum-rbln by @rebel-eunji in #109
- other: Script for logprobs validation by @rebel-eunji in #108
- fix(CI): Add cleanup job in dispatch pr ci by @rebel-jswoo in #104
- fix: checking is_collaborator logic in CI by @rebel-seinpark in #110
- Run patches in RblnPlatform.pre_register_and_update by @rebel-jaehwang in #43
- feat: compile lm_head using rbln backend (#96) by @rebel-jiwoopark in #99
- fix(core): timeout error of large models. by @rebel-jiwoopark in #111
- bug-fix: Skip using compile_context when
RBLN_COMPILE_MODEL=0 (V0) by @rebel-eunji in #116 - add: initial works for enabling warmup in v1 engine by @huijjj in #84
- other: change PR trigger to dev by @rebel-jonghewk in #123
Full Changelog: v0.9.2a0...v0.9.2a1
v0.9.2a0
What's Changed
- feat(core): Enable causal attention kernel by @rebel-jiwoopark in #95
- fix: TP in V1 engine by @rebel-jiwoopark in #101
- other: Update the version of transformers to 4.53.1 by @rebel-eunji in #102
- fix(CI): add outer collaborator checker by @rebel-jswoo in #100
- other: add team members by @rebel-seinpark in #103
- feat(core): support for Multi-LoRA by @rebel-eunji in #48
Full Changelog: v0.9.1...v0.9.2a0
v0.9.1
What's Changed
- update(dep): Update requirement version for optimum-rbln by @rebel-jonghewk in #62
- update(attn): support head size of 80 for attention. by @rebel-jiwoopark in #69
- Other(ci): add dispatch workflow - trigger pr ci by @rebel-jswoo in #61
- fix(model): Fix the
block tablein case of Sliding Window Attention by @rebel-eunji in #70 - feat(option): Introduce environment variables for vLLM models and binary caching. by @rebel-jiwoopark in #74
- fix(core): block allocation for torch compile path by @huijjj in #77
- fix: raise MemoryError when available_dram becomes negative by @junstar92 in #80
- other(ci): bugfix in trigger dispatch workflow by @rebel-jswoo in #72
- fix: get_maximum_num_blocks usage in USE_VLLM_MODEL=1 by @huijjj in #82
- fix: make RoPE more compatible with RBLN by @rebel-jaehwang in #83
- fix(core): apply changes to enable vLLM MoE models by @rebel-wonsubkim in #81
- feat: support eager mode by @rebel-jiwoopark in #66
- feat(model): support Qwen2-VL model by @rebel-eunji in #91
- Update Python version by @rebel-hjkim in #16
- other: Update optimum-rbln version in requirements.txt by @rebel-jonghewk in #98
New Contributors
- @rebel-jswoo made their first contribution in #61
- @junstar92 made their first contribution in #80
- @rebel-wonsubkim made their first contribution in #81
- @rebel-hjkim made their first contribution in #16
Full Changelog: v0.8.3...v0.9.1
v0.8.3
What's Changed
- fix(core): fix kv cache block table by @rebel-eunji in #42
- Support for flash attention by @rebel-jindol21 in #7
- update(docs): update CONTRIBUTING and PR Template by @rebel-jiwoopark in #40
- fix(core): sync num_gpu_blocks w/ estimated blocks by @rebel-jonghewk in #49
- fix(core): bug fix for block table logic by @rebel-jonghewk in #53
- fix: use config classes when computing max blocks by @huijjj in #56
- fix(ci): fix latest optimum version in PR CI by @rebel-seinpark in #54
- feat: support unit test using pytest by @rebel-seinpark in #52
- feat(model): update and refactor the sliding window attention by @rebel-eunji in #44
- fix(worker): clamp num available blocks by num required blocks by @rebel-jaehwang in #59
- fix(sampler): Fix sampler graph by @rebel-jongho in #58
- core: migrating to V1 Engine by @rebel-jiwoopark in #51
New Contributors
- @rebel-jindol21 made their first contribution in #7
- @huijjj made their first contribution in #56
- @rebel-jaehwang made their first contribution in #59
Full Changelog: v0.8.2...v0.8.3
v0.8.2
What's Changed
- [core] Update the sampler for model runner. by @rebel-jiwoopark in #2
- fix: cross encoder output by @rebel-sunwook in #3
- Support for Whisper model in vLLM by @rebel-eunji in #6
- fix: sync with optimum-rbln fix by @rebel-seinpark in #15
- feature: update attention layer for V0 engine by @rebel-jiwoopark in #14
- Add Qwen3 models by @rebel-eunji in #11
- Migrate to V1 Engine w/ Optimum-based by @rebel-eunji in #10
- Turn off Prefix caching option by @rebel-eunji in #22
- fix: use original vllm entrypoint for vllm cli by @rebel-jiwoopark in #20
- Fix shape of multi modal data in Gemma3 by @rebel-eunji in #23
- fix: get num_gpu_blocks logic in V1 by @rebel-seinpark in #29
- Fixes: gemma3's block_table scheduling bug due to padding by @rebel-thkim in #28
- feature: extend envs for RBLN environments. by @rebel-jiwoopark in #25
- Add logger and Refactor the V1 codes by @rebel-eunji in #33
- Update README with new logo by @rebel-eunji in #32
- Support for LlavaForConditionalGeneration models by @rebel-eunji in #31
- other: Add warmup runs when running with RBLNSampler() by @rebel-jonghewk in #37
- other: update optimum requirements for v0.8.2 by @rebel-jonghewk in #35
New Contributors
- @rebel-jiwoopark made their first contribution in #2
- @rebel-sunwook made their first contribution in #3
- @rebel-eunji made their first contribution in #6
- @rebel-seinpark made their first contribution in #15
- @rebel-thkim made their first contribution in #28
- @rebel-jonghewk made their first contribution in #37
Full Changelog: v0.8.1...v0.8.2
v0.8.1.post1
What's Changed
- [core] Update the sampler for model runner. by @rebel-jiwoopark in #2
- fix: cross encoder output by @rebel-sunwook in #3
New Contributors
- @rebel-jiwoopark made their first contribution in #2
- @rebel-sunwook made their first contribution in #3
Full Changelog: v0.8.1...v0.8.1.post1
v0.8.1
Full Changelog: https://github.com/rebellions-sw/vllm-rbln/commits/v0.8.1