This CL introduces several enhancements to the dot product scheduling and parallelism logic:#9916
Open
copybara-service[bot] wants to merge 1 commit intomasterfrom
Open
This CL introduces several enhancements to the dot product scheduling and parallelism logic:#9916copybara-service[bot] wants to merge 1 commit intomasterfrom
copybara-service[bot] wants to merge 1 commit intomasterfrom
Conversation
… and parallelism logic: Cache-Aware Scheduling: The `schedule_dot` function now utilizes cpu_info (including L1, L2, and L3 cache sizes and L3 sharing) to make more informed tiling decisions. Additions: * A fast path for small matrices that fit entirely within the L2 cache, skipping K-tiling. * Outer k-loop tiling sized to fit within the L2 cache. The smaller of matrix A or B is kept cache-resident. * If both A and B are contiguous, we make use of L3 and effective prefetching to hide load latency. Dynamic Tiling for Parallelism: `choose_split_factors` now dynamically determines the 2D tiling (m_split, n_split) for parallel execution. Additions: * Inclusion of element sizes (elem_a, elem_b, elem_c) for more accurate footprint calculations. * A fast path for very small workloads to run on a single thread. * Asymmetric matrix shape handling (M-heavy or N-heavy) through aspect ratios, and increase in target footprints to prevent inefficient slivers. PiperOrigin-RevId: 896309637
650eea9 to
ee84c26
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This CL introduces several enhancements to the dot product scheduling and parallelism logic:
Cache-Aware Scheduling:
The
schedule_dotfunction now utilizes cpu_info (including L1, L2, and L3 cache sizes and L3 sharing) to make more informed tiling decisions. Additions:Dynamic Tiling for Parallelism:
choose_split_factorsnow dynamically determines the 2D tiling (m_split, n_split) for parallel execution. Additions: