feature: select kernel compute dtype via RBLN_COMP_DTYPE by rebel-wonsubkim · Pull Request #654 · RBLN-SW/vllm-rbln

rebel-wonsubkim · 2026-06-08T05:41:47Z

All triton custom-kernel warmups read RBLN_COMP_DTYPE (bfloat/dlfloat, default bfloat), map to bf16/dlf16, and pass compute_dtype to func.warmup. Pairs with rebel_compiler's fp32->compute-dtype lowering.

🚀 Summary of Changes

What does this PR do? What feature, fix, or improvement does it bring?

resolve vllm-rbln triton kernels compute data type
remove unnecessary type cast in attention backend
verified with following commands
- RBLN_USE_CUSTOM_KERNEL=1 RBLN_COMP_DTYPE=dlfloat vllm serve Qwen/Qwen3-1.7B --port 8000 --max-model-len 16384 --block-size 1024 --enable-chunked-prefill --max-num-batched-tokens 512 --max-num-seqs 1
- RBLN_USE_CUSTOM_KERNEL=1 vllm serve Qwen/Qwen3-1.7B --port 8000 --max-model-len 16384 --block-size 1024 --enable-chunked-prefill --max-num-batched-tokens 512 --max-num-seqs 1

📌 Related Issues / Tickets

Resolves #
Related to https://github.com/rebellions-sw/rebel_compiler/pull/11294
Revert fix(attn): normalize seq_lens to int32 so custom kernel compiles #648, move changes into triton kernels

✅ Type of Change

🚀 Release (release)
✨ Feature (feature)
🧠 Model support (model)
🧬 Core engine changes (core)
🛠 Bug fix (fix)
⚙️ Performance improvement (perf)
🔁 Refactor or code cleanup (refactor)
📄 Documentation (docs)
❓ Other (other): please describe

🧪 How to Test

Run ...
Verify output: ...
Edge case tested: ...

📸 Screenshots / Logs (if applicable)

📋 Checklist

PR title follows Conventional Commits format
This PR is linked to an existing issue
The test method is described, and the expected result is clearly stated
Relevant documentation has been updated (if applicable)

💬 Notes

All triton custom-kernel warmups read RBLN_COMP_DTYPE (bfloat/dlfloat, default bfloat), map to bf16/dlf16, and pass compute_dtype to func.warmup. Pairs with rebel_compiler's fp32->compute-dtype lowering. Signed-off-by: wonsub kim <subang0@rebellions.ai> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…les (#648)" This reverts commit 76da31a. The seq_lens->int32 normalization is moved into the triton kernel wrappers (seq_idx.to(int32) in flash_*attention prefill/decode), so the backend-side cast is no longer needed. Signed-off-by: wonsub kim <subang0@rebellions.ai> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov · 2026-06-08T05:59:32Z

Codecov Report

❌ Patch coverage is 40.00000% with 39 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
vllm_rbln/triton_kernels/attention.py	42.85%	8 Missing ⚠️
vllm_rbln/triton_kernels/causal_attention.py	42.85%	8 Missing ⚠️
vllm_rbln/triton_kernels/flash_attention.py	42.85%	8 Missing ⚠️
vllm_rbln/triton_kernels/flash_causal_attention.py	33.33%	8 Missing ⚠️
...lm_rbln/triton_kernels/sliding_window_attention.py	36.36%	7 Missing ⚠️

📢 Thoughts on this report? Let us know!

rebel-wonsubkim · 2026-06-09T03:26:46Z

rebel compiler PR (https://github.com/rebellions-sw/rebel_compiler/pull/11294) merge 후에 해당 PR merge 가능

Each kernel file had two near-identical @triton_op wrappers (prefill/decode) differing only in the compiled triton kernel. Factor the shared body into a per-file _<family>_naive helper that takes the kernel as an argument; the wrappers now delegate. warmup stays file-local so each kernel file remains self-contained for rebel's standalone recompile. Signed-off-by: wonsub kim <subang0@rebellions.ai> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rebel-wonsubkim requested review from rebel-jaehwang, rebel-jindol21 and rebel-jinhwan June 8, 2026 05:41

rebel-wonsubkim changed the title ~~feat(triton): select kernel compute dtype via RBLN_COMP_DTYPE~~ feature: select kernel compute dtype via RBLN_COMP_DTYPE Jun 8, 2026

rebel-wonsubkim force-pushed the feat/triton-compute-dtype branch from 9d7791c to 40bc7d8 Compare June 8, 2026 05:46

rebel-wonsubkim requested a review from rebel-eunji June 8, 2026 05:57

rebel-jinhwan assigned rebel-wonsubkim Jun 8, 2026

rebel-jinhwan added the torch.compile torch.compile based implementation label Jun 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: select kernel compute dtype via RBLN_COMP_DTYPE#654

feature: select kernel compute dtype via RBLN_COMP_DTYPE#654
rebel-wonsubkim wants to merge 3 commits into
devfrom
feat/triton-compute-dtype

rebel-wonsubkim commented Jun 8, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

rebel-wonsubkim commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rebel-wonsubkim commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Summary of Changes

📌 Related Issues / Tickets

✅ Type of Change

🧪 How to Test

📸 Screenshots / Logs (if applicable)

📋 Checklist

💬 Notes

Uh oh!

codecov Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rebel-wonsubkim commented Jun 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rebel-wonsubkim commented Jun 8, 2026 •

edited

Loading

codecov Bot commented Jun 8, 2026 •

edited

Loading