v0.10.0
What's Changed
- fix: add guard_filter to rbln_sampler by @rebel-eunji in #241
- fix environment variables for rbln ccl by @rebel-ykchoi in #242
- other(ci): change the runner of PR CI by @rebel-eunji in #245
- Revert "fix: add guard_filter to rbln_sampler" by @rebel-eunji in #251
- update: Change log level in CI by @rebel-jonghewk in #250
- other(ci): log the length of generated tokens when the request is preempted by @rebel-eunji in #248
- fix(model): fix whisper model by @rebel-eunji in #247
- other(CI): clean up workflow by @rebel-seinpark in #233
- update(core): customize blockpool to increase prefix cache hit rate by @rebel-eunji in #205
- fix(ci): pkg auto update by @rebel-seinpark in #259
- fix(ci): pkg update by @rebel-seinpark in #260
- other: add enable_expert_parallel in basic example by @rebel-jiwoopark in #255
- Auto-update optimum-rbln to 0.9.5a0 by @rebel-shshin in #261
- update(worker): Set NUMA aware CPU affinity and OMP_NUM_THREADS by @rebel-yskim in #232
- other(ci): fix bug in sampler and update pytest for sampler by @rebel-eunji in #239
- Add dev0.12 event type by @rebel-jaebin in #267
- Auto-update optimum-rbln to 0.9.5a1 by @rebel-shshin in #268
- update: enable profiler for optimum-rbln based vllm by @rebel-jonghewk in #256
- fix: warm up for swa hybrid model by @rebel-jaehwang in #231
- other(ci): remove explicit secrets by @rebel-seinpark in #276
- Auto-update optimum-rbln to 0.9.5a2 by @rebel-shshin in #275
- Update runner name by @rebel-jaebin in #281
- other: repo migration (sw) by @rebel-seinpark in #271
- Auto-update optimum-rbln to 0.9.5a4 by @rebel-develop in #282
- other(test): add unit test for attention backend. by @rebel-jiwoopark in #273
- Revert "other(test): add unit test for attention backend." by @rebel-jiwoopark in #285
- Auto-update optimum-rbln to 0.9.5a5 by @rebel-develop in #286
- update(triton): replace triton with torch_triton by @rebel-jindol21 in #274
- fix moe data parallel for v1 engine by @rebel-ykchoi in #252
- Auto-update optimum-rbln to 0.9.5a6 by @rebel-develop in #288
- Revert "update(triton): replace triton with torch_triton" by @rebel-jindol21 in #289
- feat: enable topk_topp_sampler by @rebel-eunji in #284
- feat: prefill performance by request_id and exclude warmup requests from performance tracking by @rebel-eunji in #279
- feat(triton): support torch_triton by @rebel-jindol21 in #292
- fix(model): pad image tokens using an out-of-vocab index by @rebel-eunji in #291
- Auto-update optimum-rbln to 0.9.5a7 by @rebel-develop in #296
- fix: use rbln_sampler when both top_k and top_p are None by @rebel-eunji in #295
- update(triton): fixed along with triton kernels in rebel_compiler by @rebel-jindol21 in #299
- fix: apply argmax to greedy by @rebel-eunji in #300
- fix(kernel): fix kvcache by @rebel-jindol21 in #301
- Auto-update optimum-rbln to 0.9.5a8 by @rebel-develop in #302
- fix(triton): alignment enforced by @rebel-jindol21 in #303
- fix(attn): remove invalid attention param by @rebel-jiwoopark in #305
- sync main-dev by @rebel-eunji in #314
- Auto-update optimum-rbln to 0.10.0.post1 by @rebel-develop in #315
- Release v0.10.0 by @rebel-eunji in #318
New Contributors
- @rebel-yskim made their first contribution in #232
- @rebel-develop made their first contribution in #282
Full Changelog: v0.9.4...v0.10.0