Overview
Add a Conv2D to Img2Col + Matmul transformation pass in the global optimization phase to enable backends with optimized matmul implementations to leverage them for both regular and quantized convolution operations.
Motivation
This proposal originated from discussions in PR #23278.
Convolution operations are critical for computer vision workloads. The img2col (image-to-column) transformation is a well-established technique that converts convolutions into matrix multiplications, enabling backends with highly optimized GEMM (General Matrix Multiply) implementations to leverage them for convolution operations.
Current State and Problem
IREE currently has an img2col transformation pass in the preprocessing phase, but quantized convolution lowering happens in the global optimization phase via LinalgQuantizedConvToConvPass. This creates a pipeline ordering issue:
Preprocessing Phase:
- ConvertConv2DToImg2ColPass (if enabled)
Global Optimization Phase:
- LinalgQuantizedConvToConvPass ← Converts quantized conv to regular conv
- LinalgQuantizedMatmulToMatmulPass
The Problem: When quantized convolutions are lowered to regular convolutions in global optimization, they cannot benefit from the img2col transformation because that pass already ran in preprocessing. This means:
- Quantized convolutions (common in mobile/edge models) miss out on img2col optimization
- We cannot reuse the img2col transformation for both regular and quantized convolutions
- Duplicating the pass in both phases is not maintainable
Alternative Workaround: Users can manually specify a preprocessing pipeline:
iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-global-opt-quantized-conv-to-conv, iree-preprocessing-convert-conv2d-to-img2col))"
While this is technically possible, it has some practical limitations:
- Requires understanding of internal pass dependencies and ordering
- Pass ordering needs careful maintenance (e.g., must run after quantized conv lowering)
- May miss necessary cleanup passes like canonicalization and CSE
- Can become outdated as the compiler evolves
- Not ideal for general users who may not be familiar with compiler internals
Integrating the pass into the standard pipeline provides a more robust and user-friendly solution.
Proposal
Move ConvertConv2DToImg2ColPass from preprocessing to global optimization, placing it after quantized convolution lowering. Add ConvertConv2DToImg2ColPass to the global optimization pipeline that transforms linalg convolution operations into img2col + matmul form.
Global Optimization Phase:
- LinalgQuantizedConvToConvPass ← Lowers quantized conv first
- LinalgQuantizedMatmulToMatmulPass
- ConvertConv2DToImg2ColPass ← Now applies to both regular AND quantized convs
This enables:
-
Unified img2col transformation: Both regular and quantized convolutions can use the same img2col pass, improving code reuse and maintainability.
-
Better quantized model performance: Quantized convolutions (after lowering) can now benefit from img2col + optimized matmul, critical for edge deployment scenarios.
-
Leverage optimized matmul implementations: Backends with highly tuned matmul kernels (vendor libraries, ukernels, etc.) can benefit from converting convolutions to matmul operations.
-
Improved inference performance: Particularly beneficial for:
- Quantized models on edge devices (MobileNet, EfficientNet, etc.)
- Deployments where matmul implementations are more optimized than direct convolution
- Inference workloads (batch=1 or small batches are common in deployment)
- Modern CNN architectures typically rely on small convolutional kernels as their core spatial operation.
-
Better integration with IREE pipeline: Matmul operations integrate more naturally with dispatch formation and fusion heuristics in the global optimization phase.
-
No impact on other backends: The transformation is opt-in via a command-line flag, allowing backends to preserve direct convolution form for their own specialized optimizations when img2col is not beneficial.
Design
Supported Operations:
linalg.conv_2d_nhwc_hwcf → img2col + linalg.matmul
linalg.conv_2d_nchw_fchw → img2col + linalg.matmul
linalg.depthwise_conv_2d_nhwc_hwc → img2col + depthwise matmul
- Quantized convolutions (after lowering via
LinalgQuantizedConvToConvPass) → same transformation path
Pipeline Integration:
The pass runs in buildGlobalOptimizationPassPipeline() after quantized convolution lowering:
FunctionLikeNest(mainPassManager)
.addPass(createLinalgQuantizedConvToConvPass) // 1. Lower quantized conv
.addPass(createLinalgQuantizedMatmulToMatmulPass) // 2. Lower quantized matmul
.addPass(createConvertConv2DToImg2ColPass) // 3. Apply img2col (NEW, opt-in)
.addPass(IREE::Flow::createCanonicalizePass)
Critical ordering: The pass must run after LinalgQuantizedConvToConvPass so that:
- Quantized convolutions are first lowered to regular
linalg.conv_2d_* operations
- The img2col transformation can then apply to both originally-regular and originally-quantized convolutions
- All convolutions benefit from the same img2col + matmul optimization path
Opt-in via flag: The pass is controlled by a command-line option (e.g., --iree-global-opt-enable-conv2d-to-img2col) and is disabled by default. This allows:
- Backends to opt-in when img2col provides better performance for their target architecture
- Backends to preserve direct convolution form when they have specialized optimizations
- Flexibility for different deployment scenarios and hardware targets
This placement also ensures:
- Transformation occurs before dispatch region formation
- Downstream passes can optimize the resulting matmul operations
- Matmul fusion opportunities are preserved
- No impact on backends that prefer direct convolution lowering
Overview
Add a Conv2D to Img2Col + Matmul transformation pass in the global optimization phase to enable backends with optimized matmul implementations to leverage them for both regular and quantized convolution operations.
Motivation
This proposal originated from discussions in PR #23278.
Convolution operations are critical for computer vision workloads. The img2col (image-to-column) transformation is a well-established technique that converts convolutions into matrix multiplications, enabling backends with highly optimized GEMM (General Matrix Multiply) implementations to leverage them for convolution operations.
Current State and Problem
IREE currently has an img2col transformation pass in the preprocessing phase, but quantized convolution lowering happens in the global optimization phase via
LinalgQuantizedConvToConvPass. This creates a pipeline ordering issue:The Problem: When quantized convolutions are lowered to regular convolutions in global optimization, they cannot benefit from the img2col transformation because that pass already ran in preprocessing. This means:
Alternative Workaround: Users can manually specify a preprocessing pipeline:
iree-preprocessing-pass-pipeline="builtin.module(util.func(iree-global-opt-quantized-conv-to-conv, iree-preprocessing-convert-conv2d-to-img2col))"While this is technically possible, it has some practical limitations:
Integrating the pass into the standard pipeline provides a more robust and user-friendly solution.
Proposal
Move
ConvertConv2DToImg2ColPassfrom preprocessing to global optimization, placing it after quantized convolution lowering. AddConvertConv2DToImg2ColPassto the global optimization pipeline that transforms linalg convolution operations into img2col + matmul form.This enables:
Unified img2col transformation: Both regular and quantized convolutions can use the same img2col pass, improving code reuse and maintainability.
Better quantized model performance: Quantized convolutions (after lowering) can now benefit from img2col + optimized matmul, critical for edge deployment scenarios.
Leverage optimized matmul implementations: Backends with highly tuned matmul kernels (vendor libraries, ukernels, etc.) can benefit from converting convolutions to matmul operations.
Improved inference performance: Particularly beneficial for:
Better integration with IREE pipeline: Matmul operations integrate more naturally with dispatch formation and fusion heuristics in the global optimization phase.
No impact on other backends: The transformation is opt-in via a command-line flag, allowing backends to preserve direct convolution form for their own specialized optimizations when img2col is not beneficial.
Design
Supported Operations:
linalg.conv_2d_nhwc_hwcf→ img2col +linalg.matmullinalg.conv_2d_nchw_fchw→ img2col +linalg.matmullinalg.depthwise_conv_2d_nhwc_hwc→ img2col + depthwise matmulLinalgQuantizedConvToConvPass) → same transformation pathPipeline Integration:
The pass runs in
buildGlobalOptimizationPassPipeline()after quantized convolution lowering:Critical ordering: The pass must run after
LinalgQuantizedConvToConvPassso that:linalg.conv_2d_*operationsOpt-in via flag: The pass is controlled by a command-line option (e.g.,
--iree-global-opt-enable-conv2d-to-img2col) and is disabled by default. This allows:This placement also ensures: