Skip to content

refactor apply_w8a8_block_fp8_linear in fp #6545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ChangyiYang
Copy link

See also: #4353

Motivation

refactor apply_w8a8_block_fp8_linear to make the logic more clear and more meaningful

Modifications

only refactor apply_w8a8_block_fp8_linear in fb8_utils

Checklist

@ChangyiYang
Copy link
Author

ChangyiYang commented May 25, 2025

Hi! I have create a new commit. Here are some bullet points

  • Refactored dispatch logic to return only the selected matmul implementation function at initialization time to avoid filtering overhead.
  • For implementations requiring runtime validation (e.g., shape or dtype constraints), added fallback to Triton when conditions are not met.
  • Moved runtime condition checks ahead of computation to avoid redundant operations and unnecessary overhead.
  • Extracted w8a8_block_fp8_matmul_deepgemm from w8a8_block_fp8_matmul to enable direct usage of DeepGEMM kernels when applicable.
  • Retained the function w8a8_block_fp8_matmul as the unified entry point for matrix multiplication, to avoid requiring users to manually check whether DeepGEMM is available each time they use it.

Some choices I make, which may potentially have some improve space

  • In w8a8_block_fp8_matmul_deepgemm, use the same assertion check as w8a8_block_fp8_matmul, which may have some unnecessary check

Please check if any futhur modification is needed!

@ChangyiYang
Copy link
Author

Hi ! Here are some modification for this commit

  • split out w8a8_block_fp8_matmul_triton
  • retain w8a8_block_fp8_matmul only for testing purpose
  • for benchmarks that use deepgemm, directly call w8a8_block_fp8_matmul_deepgemm

Feel free to tell me if there anything needs further modification!

@ChangyiYang
Copy link
Author

Hi! Typo fixed. Feel free to tell me if there is any more issue!

@ChangyiYang ChangyiYang requested a review from BBuf May 27, 2025 03:09
@zhyncs
Copy link
Member

zhyncs commented May 27, 2025

cc @HaiShaw

@ChangyiYang
Copy link
Author

@Alcanderian hi, I click update branch to merge from main and seems needs to be approved again. Can you kindly approve again?

@ChangyiYang ChangyiYang force-pushed the refactor_apply_w8a8_block_fp8_linear branch from d9da17a to 48e3781 Compare May 27, 2025 04:08
@zhyncs zhyncs added the ready-to-merge The PR is ready to merge after the CI is green. label May 27, 2025
@ChangyiYang
Copy link
Author

@Alcanderian thanks for pointing that out! fix that now. Can you run CI again?

@ChangyiYang
Copy link
Author

ChangyiYang commented May 28, 2025

Hi! I think I have fixed 2 minor bugs and the CI seems be correct ( the failing one is not actually failing and 9 pipelines are blocking). Feel free to tell me if any more adjustment needed before merge! Thanks you all guys for the help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-to-merge The PR is ready to merge after the CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants