other(test): add unit/e2e tests for attention and triton_kernels by rebel-jinhwan · Pull Request #522 · RBLN-SW/vllm-rbln

rebel-jinhwan · 2026-04-10T02:49:20Z

…rage >98%)

Add unit tests for triton_kernels: registration, fake ops, wrappers (prefill + decode for all 5 kernel types)
Add unit tests for v1/attention/backends/flash_attention: custom op impls, backend class, metadata builder, forward dispatch routing, sinks routing
Add host-reference comparison tests parametrized by TP head configs (kv_heads=1/2/4, groups=4/2/1, head_dim=64/128) to validate attention correctness across different tensor parallel configurations
Add e2e compile tests (SDPA, masked, GQA, causal) that run on RBLN NPU via torch.compile(backend="rbln") and compare with host reference
Fix UnboundLocalError in forward() for causal+normal and non-causal+normal paths when VLLM_RBLN_COMPILE_MODEL=False (missing else branches)
Add pragma:no-cover to triton.jit kernels and warmup() (hardware-only code)

🚀 Summary of Changes

What does this PR do? What feature, fix, or improvement does it bring?

📌 Related Issues / Tickets

Resolves #
Related to #

✅ Type of Change

🚀 Release (release)
✨ Feature (feature)
🧠 Model support (model)
🧬 Core engine changes (core)
🛠 Bug fix (fix)
⚙️ Performance improvement (perf)
🔁 Refactor or code cleanup (refactor)
📄 Documentation (docs)
❓ Other (other): please describe

🧪 How to Test

Run ...
Verify output: ...
Edge case tested: ...

📸 Screenshots / Logs (if applicable)

📋 Checklist

PR title follows Conventional Commits format
This PR is linked to an existing issue
The test method is described, and the expected result is clearly stated
Relevant documentation has been updated (if applicable)

💬 Notes

…rage >98%) - Add unit tests for triton_kernels: registration, fake ops, wrappers (prefill + decode for all 5 kernel types) - Add unit tests for v1/attention/backends/flash_attention: custom op impls, backend class, metadata builder, forward dispatch routing, sinks routing - Add host-reference comparison tests parametrized by TP head configs (kv_heads=1/2/4, groups=4/2/1, head_dim=64/128) to validate attention correctness across different tensor parallel configurations - Add e2e compile tests (SDPA, masked, GQA, causal) that run on RBLN NPU via torch.compile(backend="rbln") and compare with host reference - Add edge case tests: multi-batch flash_causal decode, causal prefill mask skip, noncausal decode per-batch mask, missing batch_pad assertion, sliding_window batch_attn_opt int32 cast - Fix UnboundLocalError in forward() for causal+normal and non-causal+normal paths when VLLM_RBLN_COMPILE_MODEL=False (missing else branches) - Add pragma:no-cover to triton.jit kernels and warmup() (NPU-only code) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rebel-jinhwan changed the title ~~test(attn): add unit/e2e tests for attention and triton_kernels~~ other(test): add unit/e2e tests for attention and triton_kernels Apr 10, 2026

rebel-jinhwan force-pushed the jinhwan/pytest-attention-improve branch from 370652b to 3136223 Compare April 10, 2026 04:39

rebel-jinhwan force-pushed the jinhwan/pytest-attention-improve branch from 3136223 to bc7cf09 Compare April 10, 2026 05:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

other(test): add unit/e2e tests for attention and triton_kernels #522

other(test): add unit/e2e tests for attention and triton_kernels #522
rebel-jinhwan wants to merge 1 commit intodevfrom
jinhwan/pytest-attention-improve

rebel-jinhwan commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rebel-jinhwan commented Apr 10, 2026

🚀 Summary of Changes

📌 Related Issues / Tickets

✅ Type of Change

🧪 How to Test

📸 Screenshots / Logs (if applicable)

📋 Checklist

💬 Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant