Skip to content

[BUG] TRT-LLM Gen full attn. Incorrect result for head_dim=256 #1993

@vadiklyutiy

Description

@vadiklyutiy

If change head_dim from 128 to 256 here

and run

pytest flashinfer/tests/attention/test_trtllm_gen_attention.py::test_trtllm_batch_decode

will see 756 failed tests.

FlashInfer version

uv pip show flashinfer-python
Name: flashinfer-python
Version: 0.4.1

Context: this head_dim from Qwen3-Next model.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions