Skip to content

other(test): add unit/e2e tests for attention and triton_kernels #522

Draft
rebel-jinhwan wants to merge 1 commit intodevfrom
jinhwan/pytest-attention-improve
Draft

other(test): add unit/e2e tests for attention and triton_kernels #522
rebel-jinhwan wants to merge 1 commit intodevfrom
jinhwan/pytest-attention-improve

Conversation

@rebel-jinhwan
Copy link
Copy Markdown
Contributor

…rage >98%)

  • Add unit tests for triton_kernels: registration, fake ops, wrappers (prefill + decode for all 5 kernel types)
  • Add unit tests for v1/attention/backends/flash_attention: custom op impls, backend class, metadata builder, forward dispatch routing, sinks routing
  • Add host-reference comparison tests parametrized by TP head configs (kv_heads=1/2/4, groups=4/2/1, head_dim=64/128) to validate attention correctness across different tensor parallel configurations
  • Add e2e compile tests (SDPA, masked, GQA, causal) that run on RBLN NPU via torch.compile(backend="rbln") and compare with host reference
  • Fix UnboundLocalError in forward() for causal+normal and non-causal+normal paths when VLLM_RBLN_COMPILE_MODEL=False (missing else branches)
  • Add pragma:no-cover to triton.jit kernels and warmup() (hardware-only code)

🚀 Summary of Changes

What does this PR do? What feature, fix, or improvement does it bring?


📌 Related Issues / Tickets

  • Resolves #
  • Related to #

✅ Type of Change

  • 🚀 Release (release)
  • ✨ Feature (feature)
  • 🧠 Model support (model)
  • 🧬 Core engine changes (core)
  • 🛠 Bug fix (fix)
  • ⚙️ Performance improvement (perf)
  • 🔁 Refactor or code cleanup (refactor)
  • 📄 Documentation (docs)
  • ❓ Other (other): please describe

🧪 How to Test

  1. Run ...
  2. Verify output: ...
  3. Edge case tested: ...

📸 Screenshots / Logs (if applicable)


📋 Checklist

  • PR title follows Conventional Commits format
  • This PR is linked to an existing issue
  • The test method is described, and the expected result is clearly stated
  • Relevant documentation has been updated (if applicable)

💬 Notes


@rebel-jinhwan rebel-jinhwan changed the title test(attn): add unit/e2e tests for attention and triton_kernels other(test): add unit/e2e tests for attention and triton_kernels Apr 10, 2026
@rebel-jinhwan rebel-jinhwan force-pushed the jinhwan/pytest-attention-improve branch from 370652b to 3136223 Compare April 10, 2026 04:39
…rage >98%)

- Add unit tests for triton_kernels: registration, fake ops, wrappers
  (prefill + decode for all 5 kernel types)
- Add unit tests for v1/attention/backends/flash_attention: custom op impls,
  backend class, metadata builder, forward dispatch routing, sinks routing
- Add host-reference comparison tests parametrized by TP head configs
  (kv_heads=1/2/4, groups=4/2/1, head_dim=64/128) to validate attention
  correctness across different tensor parallel configurations
- Add e2e compile tests (SDPA, masked, GQA, causal) that run on RBLN NPU
  via torch.compile(backend="rbln") and compare with host reference
- Add edge case tests: multi-batch flash_causal decode, causal prefill
  mask skip, noncausal decode per-batch mask, missing batch_pad assertion,
  sliding_window batch_attn_opt int32 cast
- Fix UnboundLocalError in forward() for causal+normal and non-causal+normal
  paths when VLLM_RBLN_COMPILE_MODEL=False (missing else branches)
- Add pragma:no-cover to triton.jit kernels and warmup() (NPU-only code)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rebel-jinhwan rebel-jinhwan force-pushed the jinhwan/pytest-attention-improve branch from 3136223 to bc7cf09 Compare April 10, 2026 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant