Suggestion Description
I would suggest moving common benchmarking/perf test(https://github.com/ROCm/FlyDSL/blob/main/tests/test_common.py) from test folder to FlyDSL python source, so it can be imported directly like this
An example,
from fx.testing import checkAllclose, run_perftest
Why should this be done
- Triton DSL and even CuteDSL provides such feature as part of their API. See (https://triton-lang.org/main/python-api/triton.testing.html and https://github.com/triton-lang/triton/blob/main/python/triton/testing.py). CuteDSL also provides a similar API. See this example code https://github.com/NVIDIA/cutlass/blob/main/examples/python/CuTeDSL/notebooks/elementwise_add.ipynb
- We almost always have benchmark code for most of the kernels. For example, Triton kernels in AITER have a whole suite of benchmarks for kernels. https://github.com/ROCm/aiter/tree/main/op_tests/op_benchmarks/triton. They all use the common benchmarking code provided by Triton DSL. Some examples.
https://github.com/ROCm/aiter/blob/main/op_tests/op_benchmarks/triton/bench_gemm_afp4wfp4.py#L51
https://github.com/ROCm/aiter/blob/main/op_tests/op_benchmarks/triton/bench_batch_prefill.py#L158
As we develop more FlyDSL kernels, I expect similar benchmarking suite to exist for them. Using common benchmarking code/API that is provided by FlyDSL would be useful here.
Operating System
No response
GPU
No response
ROCm Component
No response
Suggestion Description
I would suggest moving common benchmarking/perf test(https://github.com/ROCm/FlyDSL/blob/main/tests/test_common.py) from test folder to FlyDSL python source, so it can be imported directly like this
An example,
from fx.testing import checkAllclose, run_perftestWhy should this be done
https://github.com/ROCm/aiter/blob/main/op_tests/op_benchmarks/triton/bench_gemm_afp4wfp4.py#L51
https://github.com/ROCm/aiter/blob/main/op_tests/op_benchmarks/triton/bench_batch_prefill.py#L158
As we develop more FlyDSL kernels, I expect similar benchmarking suite to exist for them. Using common benchmarking code/API that is provided by FlyDSL would be useful here.
Operating System
No response
GPU
No response
ROCm Component
No response