Skip to content

Commit 7aa7818

Browse files
[None][feat] Add triton paged attention for AutoDeploy (NVIDIA#12642)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
1 parent 4c97a03 commit 7aa7818

File tree

6 files changed

+1886
-2
lines changed

6 files changed

+1886
-2
lines changed

tensorrt_llm/_torch/auto_deploy/custom_ops/attention/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@
2222
- trtllm_attention: TRT-LLM thop.attention-based optimized attention
2323
- triton_attention: Triton-based attention implementations
2424
- triton_attention_with_kv_cache: Triton attention with KV cache support
25-
- triton_attention_with_paged_kv_cache: Triton attention with paged KV cache
25+
- triton_paged_attention: Triton paged attention (two-stage flash-decode) with HND layout
2626
- onnx_attention: Placeholder ops for ONNX export of attention mechanisms
2727
"""
2828

@@ -34,5 +34,6 @@
3434
"triton_attention",
3535
"triton_attention_with_kv_cache",
3636
"triton_attention_with_paged_kv_cache",
37+
"triton_paged_attention",
3738
"onnx_attention",
3839
]

0 commit comments

Comments
 (0)