Skip to content

feature: select kernel compute dtype via RBLN_COMP_DTYPE#654

Open
rebel-wonsubkim wants to merge 3 commits into
devfrom
feat/triton-compute-dtype
Open

feature: select kernel compute dtype via RBLN_COMP_DTYPE#654
rebel-wonsubkim wants to merge 3 commits into
devfrom
feat/triton-compute-dtype

Conversation

@rebel-wonsubkim

@rebel-wonsubkim rebel-wonsubkim commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

All triton custom-kernel warmups read RBLN_COMP_DTYPE (bfloat/dlfloat, default bfloat), map to bf16/dlf16, and pass compute_dtype to func.warmup. Pairs with rebel_compiler's fp32->compute-dtype lowering.

🚀 Summary of Changes

What does this PR do? What feature, fix, or improvement does it bring?

  • resolve vllm-rbln triton kernels compute data type
  • remove unnecessary type cast in attention backend
  • verified with following commands
    • RBLN_USE_CUSTOM_KERNEL=1 RBLN_COMP_DTYPE=dlfloat vllm serve Qwen/Qwen3-1.7B --port 8000 --max-model-len 16384 --block-size 1024 --enable-chunked-prefill --max-num-batched-tokens 512 --max-num-seqs 1
    • RBLN_USE_CUSTOM_KERNEL=1 vllm serve Qwen/Qwen3-1.7B --port 8000 --max-model-len 16384 --block-size 1024 --enable-chunked-prefill --max-num-batched-tokens 512 --max-num-seqs 1

📌 Related Issues / Tickets


✅ Type of Change

  • 🚀 Release (release)
  • ✨ Feature (feature)
  • 🧠 Model support (model)
  • 🧬 Core engine changes (core)
  • 🛠 Bug fix (fix)
  • ⚙️ Performance improvement (perf)
  • 🔁 Refactor or code cleanup (refactor)
  • 📄 Documentation (docs)
  • ❓ Other (other): please describe

🧪 How to Test

  1. Run ...
  2. Verify output: ...
  3. Edge case tested: ...

📸 Screenshots / Logs (if applicable)


📋 Checklist

  • PR title follows Conventional Commits format
  • This PR is linked to an existing issue
  • The test method is described, and the expected result is clearly stated
  • Relevant documentation has been updated (if applicable)

💬 Notes


All triton custom-kernel warmups read RBLN_COMP_DTYPE (bfloat/dlfloat, default
bfloat), map to bf16/dlf16, and pass compute_dtype to func.warmup. Pairs with
rebel_compiler's fp32->compute-dtype lowering.

Signed-off-by: wonsub kim <subang0@rebellions.ai>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rebel-wonsubkim rebel-wonsubkim changed the title feat(triton): select kernel compute dtype via RBLN_COMP_DTYPE feature: select kernel compute dtype via RBLN_COMP_DTYPE Jun 8, 2026
…les (#648)"

This reverts commit 76da31a.

The seq_lens->int32 normalization is moved into the triton kernel wrappers
(seq_idx.to(int32) in flash_*attention prefill/decode), so the backend-side
cast is no longer needed.

Signed-off-by: wonsub kim <subang0@rebellions.ai>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rebel-wonsubkim rebel-wonsubkim force-pushed the feat/triton-compute-dtype branch from 9d7791c to 40bc7d8 Compare June 8, 2026 05:46
@codecov

codecov Bot commented Jun 8, 2026

Copy link
Copy Markdown

@rebel-jinhwan rebel-jinhwan added the torch.compile torch.compile based implementation label Jun 8, 2026
@rebel-wonsubkim

Copy link
Copy Markdown
Contributor Author

rebel compiler PR (https://github.com/rebellions-sw/rebel_compiler/pull/11294) merge 후에 해당 PR merge 가능

Each kernel file had two near-identical @triton_op wrappers (prefill/decode)
differing only in the compiled triton kernel. Factor the shared body into a
per-file _<family>_naive helper that takes the kernel as an argument; the
wrappers now delegate. warmup stays file-local so each kernel file remains
self-contained for rebel's standalone recompile.

Signed-off-by: wonsub kim <subang0@rebellions.ai>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

torch.compile torch.compile based implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants