-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
performanceissues related to performance regressionsissues related to performance regressions
Description
Describe the issue
Description
We observed a performance regression in the Mod operator when using float32 data type with fmod=1 attribute between ONNXRuntime v1.20.0 and v1.21.0. This regression is specific to this configuration - integer types (int32, int64) and fmod=0 are not affected.
Affected Operator
Mod
- Opset Version: 13
- Data Type: float32 (regressed)
- Attribute: fmod=1 (regressed)
- Regression: +21% to +149% kernel slowdown
Test Case Details
Test Case 1: mod_13_v2_mod_float32_fmod_one_large_4d
Inputs:
-
input_0 tensor:
- Data type: float32 (type=1)
- Shape: [2, 64, 56, 56]
-
input_1 tensor:
- Data type: float32 (type=1)
- Shape: [2, 64, 56, 56]
Attributes:
- fmod: 1 (C-style fmod semantics)
Output:
- Data type: float32
- Shape: [2, 64, 56, 56]
- Element-wise modulo operation
Performance:
- v1.20.0: 3448.4 ms (kernel time)
- v1.21.0: 4184.8 ms (kernel time)
- Kernel regression: +21.4% slowdown
- Total time regression: +21.3% slowdown
Test Case 2: mod_13_v3_test_mod_basic_float32_fmod
Inputs:
-
A tensor:
- Data type: float32 (type=1)
- Shape: [2, 64, 56, 56]
-
B tensor:
- Data type: float32 (type=1)
- Shape: [2, 64, 56, 56]
Attributes:
- fmod: 1
Performance:
- v1.20.0: 3453.2 ms (kernel time)
- v1.21.0: 4184.7 ms (kernel time)
- Kernel regression: +21.2% slowdown
Test Case 3: mod_13_v3_test_mod_mixed_shape_broadcast_float32
Inputs:
-
A tensor:
- Data type: float32 (type=1)
- Shape: [1, 3, 32, 32]
-
B tensor:
- Data type: float32 (type=1)
- Shape: [2, 3, 32, 32]
Attributes:
- fmod: 1
Output:
- Shape: [2, 3, 32, 32] (broadcast result)
Performance:
- v1.20.0: 0.126 ms (kernel time)
- v1.21.0: 0.266 ms (kernel time)
- Kernel regression: +110.2% slowdown
- Total time regression: +102.2% slowdown
Test Case 4: mod_mod_13_mod_fmod1_float32_negative_divisor
Inputs:
-
X tensor:
- Data type: float32 (type=1)
- Shape: [8, 128]
-
Y tensor:
- Data type: float32 (type=1)
- Shape: [8, 128]
Attributes:
- fmod: 1
Performance:
- v1.20.0: 5.18 ms (kernel time)
- v1.21.0: 12.33 ms (kernel time)
- Kernel regression: +138.1% slowdown
Regression Characteristics
Configuration-Specific Regression
REGRESSED (float32 + fmod=1):
mod_13_v2_mod_float32_fmod_one_large_4d: +21.4% slowdownmod_13_v3_test_mod_basic_float32_fmod: +21.2% slowdownmod_13_v3_test_mod_mixed_shape_broadcast_float32: +110.2% slowdown (broadcast)mod_mod_13_mod_fmod1_float32_negative_divisor: +138.1% slowdown
NOT REGRESSED (int32 + fmod=0):
mod_13_v2_mod_int32_default_attribute_large_4d: -2.9% (improved)mod_13_v2_mod_int32_mixed_signs_fmod_zero_2d: No regression
NOT REGRESSED (int64 + fmod=0):
mod_13_v2_mod_int64_explicit_fmod_zero_3d: No regression
Key Characteristics
- Configuration-specific: Only float32 with fmod=1 affected
- Opset version: Version 13
- Shape-dependent: Broadcast operations show higher regression (+110% vs +21%)
- Persistence: Regression persists to latest version (1.23.0)
- Partial recovery: v1.23.0 shows 3-5% improvement over v1.21.0, but still regressed from v1.20.0
To reproduce
- Download zip file
- Run benchmark using the provided script:
python script_profiling.py mod_13_v3_test_mod_mixed_shape_broadcast_float32 1.20.0 1.21.0
Urgency
No response
Platform
Linux
OS Version
Ubuntu 24.04.3 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.21.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performanceissues related to performance regressionsissues related to performance regressions