This CL introduces several enhancements to the dot product scheduling and parallelism logic: by copybara-service[bot] · Pull Request #9916 · google/XNNPACK

copybara-service · 2026-04-08T07:36:06Z

This CL introduces several enhancements to the dot product scheduling and parallelism logic:

Cache-Aware Scheduling:
The schedule_dot function now utilizes cpu_info (including L1, L2, and L3 cache sizes and L3 sharing) to make more informed tiling decisions. Additions:

A fast path for small matrices that fit entirely within the L2 cache, skipping K-tiling.
Outer k-loop tiling sized to fit within the L2 cache. The smaller of matrix A or B is kept cache-resident.
If both A and B are contiguous, we make use of L3 and effective prefetching to hide load latency.

Dynamic Tiling for Parallelism:
choose_split_factors now dynamically determines the 2D tiling (m_split, n_split) for parallel execution. Additions:

Inclusion of element sizes (elem_a, elem_b, elem_c) for more accurate footprint calculations.
A fast path for very small workloads to run on a single thread.
Asymmetric matrix shape handling (M-heavy or N-heavy) through aspect ratios, and increase in target footprints to prevent inefficient slivers.

… and parallelism logic: Cache-Aware Scheduling: The `schedule_dot` function now utilizes cpu_info (including L1, L2, and L3 cache sizes and L3 sharing) to make more informed tiling decisions. Additions: * A fast path for small matrices that fit entirely within the L2 cache, skipping K-tiling. * Outer k-loop tiling sized to fit within the L2 cache. The smaller of matrix A or B is kept cache-resident. * If both A and B are contiguous, we make use of L3 and effective prefetching to hide load latency. Dynamic Tiling for Parallelism: `choose_split_factors` now dynamically determines the 2D tiling (m_split, n_split) for parallel execution. Additions: * Inclusion of element sizes (elem_a, elem_b, elem_c) for more accurate footprint calculations. * A fast path for very small workloads to run on a single thread. * Asymmetric matrix shape handling (M-heavy or N-heavy) through aspect ratios, and increase in target footprints to prevent inefficient slivers. PiperOrigin-RevId: 896309637

copybara-service bot force-pushed the test_896309637 branch from 650eea9 to ee84c26 Compare April 9, 2026 23:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This CL introduces several enhancements to the dot product scheduling and parallelism logic:#9916

This CL introduces several enhancements to the dot product scheduling and parallelism logic:#9916
copybara-service[bot] wants to merge 1 commit intomasterfrom
test_896309637

copybara-service bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

copybara-service bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

copybara-service bot commented Apr 8, 2026 •

edited

Loading