Skip to content

Make O2 the default optimization level for aie-opt #1153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

newling
Copy link
Contributor

@newling newling commented Feb 28, 2025

Relevant performance numbers scraped from CI. These are the direct codegen benchmarks that change from O3 to O2.

Time Comparison

Benchmark Before (us) After (us)
matmul_512_4096_512_bf16_f32_npu1_4col_outline 2469.0 2483.0
matmul_transpose_b_512_4096_512_bf16_f32_npu1_4col_outline 1948.0 1950.0
matmul_4096_512_512_bf16_f32_npu1_4col_outline 1893.0 1916.0
matmul_transpose_a_4096_512_512_bf16_f32_npu1_4col_outline 2422.0 2332.0
matmul_512_4096_512_bf16_f32_npu1_4col_outline_4_level_tiling 1388.0 1615.0

Memory Comparison

Benchmark Before (bytes) After (bytes)
matmul_512_4096_512_bf16_f32_npu1_4col_outline 8464 3504
matmul_transpose_b_512_4096_512_bf16_f32_npu1_4col_outline 9216 3536
matmul_4096_512_512_bf16_f32_npu1_4col_outline 8464 3504
matmul_transpose_a_4096_512_512_bf16_f32_npu1_4col_outline 8656 3520
matmul_512_4096_512_bf16_f32_npu1_4col_outline_4_level_tiling 14816 3184

So matmul_512_4096_512_bf16_f32_npu1_4col_outline_4_level_tiling needs some investigation.

@yzhang93
Copy link
Contributor

Maybe leave this test matmul_512_4096_512_bf16_f32_npu1_4col_outline_4_level_tiling using O3 for now?

@newling
Copy link
Contributor Author

newling commented Mar 3, 2025

Maybe leave this test matmul_512_4096_512_bf16_f32_npu1_4col_outline_4_level_tiling using O3 for now?

I think the O3 code is better than O2 for all the tests, it's just that this is only test where the data movement is fast enough for it show up. So I'm going to dig deeper to see what else I can do to make O2 match O3. Ideally the code generated with the 2 should be identical.

@newling newling marked this pull request as draft April 14, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants