Skip to content

Conversation

@Aya-ZIbra
Copy link
Contributor

Summary:
Run C++
FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/
buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example```

Run Triton bench
buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps

Todo: Support non-paged case

Differential Revision: D81021980

@meta-cla meta-cla bot added the cla signed label Aug 29, 2025
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81021980

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81021980

Aya-ZIbra added a commit to Aya-ZIbra/tritonbench that referenced this pull request Aug 29, 2025
Summary:
Pull Request resolved: meta-pytorch#379

Run C++
      FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/
  buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example```
-------

Run Triton bench
   buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps

Todo: Support non-paged case

Differential Revision: D81021980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81021980

Aya-ZIbra added a commit to Aya-ZIbra/tritonbench that referenced this pull request Sep 4, 2025
Summary:
Pull Request resolved: meta-pytorch#379

Run C++
      FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/
  buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example```
-------

Run Triton bench
   buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps

Todo: Support non-paged case

Differential Revision: D81021980
Aya-ZIbra added a commit to Aya-ZIbra/tritonbench that referenced this pull request Sep 4, 2025
Summary:

Run C++ 
      FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/ 
  buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example```
-------

Run Triton bench
   buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps

Todo: Support non-paged case

Differential Revision: D81021980
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D81021980

Aya-ZIbra added a commit to Aya-ZIbra/tritonbench that referenced this pull request Sep 4, 2025
Summary:
Pull Request resolved: meta-pytorch#379

Run C++
      FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/
  buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example```
-------

Run Triton bench
   buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps

Todo: Support non-paged case

Differential Revision: D81021980
@meta-codesync
Copy link

meta-codesync bot commented Nov 6, 2025

@Aya-ZIbra has exported this pull request. If you are a Meta employee, you can view the originating Diff in D81021980.

Aya-ZIbra added a commit to Aya-ZIbra/tritonbench that referenced this pull request Nov 6, 2025
Summary:

Run C++ 
      FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/ 
  buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example```
-------

Run Triton bench
   buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps

Todo: Support non-paged case

Reviewed By: YJYJLee

Differential Revision: D81021980
Summary:

Run C++ 
      FLASHINFER_CUBIN_DIR=/data/users/$USER/fbsource/fbcode/deeplearning/flashinfer/fb/cubins/ 
  buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //deeplearning/flashinfer/trtllm_kernel_interfaces:run_example```
-------

Run Triton bench
   buck2 run mode/opt mode/inplace -c fbcode.enable_gpu_sections=true -c fbcode.nvcc_arch=b200a -c fbcode.platform010_cuda_version=12.8 //pytorch/tritonbench:run -- --op decoding_attention --only trtllm_decode_fmha --seq-len-q 1 --metrics gbps

Todo: Support non-paged case

Reviewed By: YJYJLee

Differential Revision: D81021980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants