Skip to content

[Performance] Performance regression in MatMul operator for int64 data type between v1.20.0 and v1.21.0 #27153

@junghyunpark2001

Description

@junghyunpark2001

Describe the issue

Description

We observed a performance regression in the MatMul operator when using int64 data type inputs between ONNXRuntime v1.20.0 and v1.21.0. This regression is specific to int64 type - other data types (e.g., float32, int32) are not affected.

Affected Operator

MatMul

  • Opset Version: 13
  • Data Type: int64 (regressed)
  • Regression: +72% to +144% kernel slowdown

Test Case Details

Test Case 1: matmul_13_v2_matmul_int64_mixed_rank_broadcast

Inputs:

  • input_0 tensor:

    • Data type: int64 (type=7)
    • Shape: [3, 128, 256]
  • input_1 tensor:

    • Data type: int64 (type=7)
    • Shape: [256, 64]

Output:

  • Data type: int64
  • Shape: [3, 128, 64]
  • Matrix multiplication with broadcast

Performance:

  • v1.20.0: 4.54 ms (kernel time)
  • v1.21.0: 11.11 ms (kernel time)
  • Kernel regression: +144.4% slowdown
  • Total time regression: +144.7% slowdown

Test Case 2: matmul_13_v3_test_matmul_2d_int64

Inputs:

  • A tensor:

    • Data type: int64 (type=7)
    • Shape: [32, 24]
  • B tensor:

    • Data type: int64 (type=7)
    • Shape: [24, 10]

Performance:

  • v1.20.0: 0.011 ms (kernel time)
  • v1.21.0: 0.019 ms (kernel time)
  • Kernel regression: +72.7% slowdown

Test Case 3: matmul_13_v3_test_matmul_single_batch_edge_int64

Inputs:

  • A tensor:

    • Data type: int64 (type=7)
    • Shape: [1, 32, 64]
  • B tensor:

    • Data type: int64 (type=7)
    • Shape: [1, 64, 16]

Performance:

  • v1.20.0: 0.034 ms (kernel time)
  • v1.21.0: 0.060 ms (kernel time)
  • Kernel regression: +75.0% slowdown

Regression Characteristics

Type-Specific Regression

REGRESSED (int64):

  • matmul_13_v2_matmul_int64_mixed_rank_broadcast: +144.4% slowdown
  • matmul_13_v3_test_matmul_2d_int64: +72.7% slowdown
  • matmul_13_v3_test_matmul_single_batch_edge_int64: +75.0% slowdown

NOT REGRESSED (float32):

  • matmul_13_v2_matmul_float32_large_2d: +1.5% (stable)
  • matmul_13_v2_matmul_float32_batched_3d: No regression

NOT REGRESSED (int32):

  • matmul_13_v2_matmul_int32_batched: No regression

To reproduce

  1. Download zip file

Archive.zip

  1. Run benchmark using the provided script:
    python script_profiling.py matmul_13_v2_matmul_int64_mixed_rank_broadcast 1.20.0 1.21.0

Urgency

No response

Platform

Linux

OS Version

Ubuntu 24.04.3 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.21

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

No response

Is this a quantized model?

Yes

Metadata

Metadata

Assignees

No one assigned

    Labels

    performanceissues related to performance regressions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions