-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
performanceissues related to performance regressionsissues related to performance regressions
Description
Describe the issue
Summary
- Operator: Where (opset 16, float32)
- Scope: Broadcasted inputs only (shapes differ)
- Impact: Kernel time slower by ~4–11% (validated, persists to v1.23.0)
What regresses
- Case: condition [2,64,56,56], X/Y [1,64,1,1] → kernel +3.97%
- Validation data: kernel_regression_pct ≈ 8.9% (confirmed, persists)
What does not regress
- Same-shape inputs ([2,64,56,56] for condition/X/Y) improved in v1.21.0
- Mixed broadcast case (condition small, X/Y large) did not regress in our run
Takeaway
- Regression is specific to broadcast-from-small-to-large tensors; same-shape paths are optimized.
To reproduce
- Download zip file
- Run benchmark using the provided script:
python script_profiling.py where_16_v2_where_float32_4d_broadcast_condition 1.20.0 1.21.0
Urgency
No response
Platform
Linux
OS Version
Ubuntu 24.04.3 LTS
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.21
ONNX Runtime API
Python
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
No response
Is this a quantized model?
Yes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
performanceissues related to performance regressionsissues related to performance regressions