feature: select kernel compute dtype via RBLN_COMP_DTYPE#654
Open
rebel-wonsubkim wants to merge 3 commits into
Open
feature: select kernel compute dtype via RBLN_COMP_DTYPE#654rebel-wonsubkim wants to merge 3 commits into
rebel-wonsubkim wants to merge 3 commits into
Conversation
All triton custom-kernel warmups read RBLN_COMP_DTYPE (bfloat/dlfloat, default bfloat), map to bf16/dlf16, and pass compute_dtype to func.warmup. Pairs with rebel_compiler's fp32->compute-dtype lowering. Signed-off-by: wonsub kim <subang0@rebellions.ai> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…les (#648)" This reverts commit 76da31a. The seq_lens->int32 normalization is moved into the triton kernel wrappers (seq_idx.to(int32) in flash_*attention prefill/decode), so the backend-side cast is no longer needed. Signed-off-by: wonsub kim <subang0@rebellions.ai> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9d7791c to
40bc7d8
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
Contributor
Author
|
rebel compiler PR (https://github.com/rebellions-sw/rebel_compiler/pull/11294) merge 후에 해당 PR merge 가능 |
Each kernel file had two near-identical @triton_op wrappers (prefill/decode) differing only in the compiled triton kernel. Factor the shared body into a per-file _<family>_naive helper that takes the kernel as an argument; the wrappers now delegate. warmup stays file-local so each kernel file remains self-contained for rebel's standalone recompile. Signed-off-by: wonsub kim <subang0@rebellions.ai> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
All triton custom-kernel warmups read RBLN_COMP_DTYPE (bfloat/dlfloat, default bfloat), map to bf16/dlf16, and pass compute_dtype to func.warmup. Pairs with rebel_compiler's fp32->compute-dtype lowering.
🚀 Summary of Changes
RBLN_USE_CUSTOM_KERNEL=1 RBLN_COMP_DTYPE=dlfloat vllm serve Qwen/Qwen3-1.7B --port 8000 --max-model-len 16384 --block-size 1024 --enable-chunked-prefill --max-num-batched-tokens 512 --max-num-seqs 1RBLN_USE_CUSTOM_KERNEL=1 vllm serve Qwen/Qwen3-1.7B --port 8000 --max-model-len 16384 --block-size 1024 --enable-chunked-prefill --max-num-batched-tokens 512 --max-num-seqs 1📌 Related Issues / Tickets
✅ Type of Change
release)feature)model)core)fix)perf)refactor)docs)other): please describe🧪 How to Test
.........📸 Screenshots / Logs (if applicable)
📋 Checklist
💬 Notes