Skip to content

Onnx rebase from main - Update SimOp Structure, Implement Missing ONNX Ops, and Fix Check-in Tests#245

Draft
smistTT wants to merge 1 commit intomainfrom
onnx-rebase-from-main
Draft

Onnx rebase from main - Update SimOp Structure, Implement Missing ONNX Ops, and Fix Check-in Tests#245
smistTT wants to merge 1 commit intomainfrom
onnx-rebase-from-main

Conversation

@smistTT
Copy link
Contributor

@smistTT smistTT commented Dec 1, 2025

Description
This PR updates the SimOp infrastructure to support the new factory pattern, implements a comprehensive set of missing ONNX operators required for training and inference graphs, and resolves all functional and static analysis (MyPy) failures in the check-in test suite. The branch has been rebased onto the latest main.
Key Changes

  1. Core Operator Infrastructure (ttsim/ops/op.py)
    SimOpFactory Refactor: Restructured SimOpFactory to explicitly map operator types to their classes. Moved the factory definition to the end of the file to resolve forward reference NameError issues.
    Helper Functions: Restored and integrated critical helper functions (check_io_counts, update_output_tensor, pooling_shape_inference, get_tensor_broadcast_shape).
    Missing Operators Implemented: Added SimOp implementations for over 15 previously missing or commented-out operators, including:
    Math/Activation: FastGelu, Softplus, HardSwish, Gemm, FusedMatMul.
    Training/Loss: SoftmaxCrossEntropyLoss, InPlaceAccumulatorV2, ConcatTraining.
    Reductions: ReduceSum, ReduceMean, ReduceProd, ReduceMin.
    Gradients: DropoutGrad, FastGeluGrad, SoftmaxGrad_13, GatherGrad, SoftmaxCrossEntropyLossGrad.
    Cleanup: Removed duplicate class definitions (SoftPlusOp, ReciprocalOp) and established proper aliases (e.g., GeluOp = ONNXGeluOp).
  2. Test Suite Updates
    Import Fixes: Updated ~40 test files in tests/test_ops/ to correctly import SimOpFactory and operator classes.
    Test Logic:
    Refactored tests/test_ops/test_reductions.py to instantiate operators via SimOpFactory instead of the base class.
    Corrected assertion messages in tests/test_ops/test_softplus.py.
    Naming Conflicts: Renamed workloads/ttnn/llama3/test_attention.py → test_llama3_attention.py to fix pytest collection collisions.
  3. Static Analysis & Build Config
    MyPy Configuration: Updated pyproject.toml to:
    Enable enable_incomplete_feature = ["NewGenericSyntax"] to support PEP 695 type aliases (fixing ~40 static analysis errors).
    Disable warn_unreachable to reduce false positives in op.py.
    Strict Typing Fixes:
    tools/ttsi_corr/chart_builder.py: Added assertions to handle Optional types (chart.x_axis, etc.).
    workloads/LeViT/LeViT.py: Suppressed invalid type inference on callable objects.
    ttsim/ops/init.py: Exported SimOpFactory to the package level.
    Validation
    Functional Tests: checkin_tests.py all passed (including pytest suite and all simulation studies).
    Static Analysis: checkin_tests.py static passed (MyPy).
    Rebase: Verified clean rebase onto origin/main (commit 6212425)

Copilot AI review requested due to automatic review settings December 1, 2025 20:55
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR rebases the ONNX operator implementation work onto the latest main branch and implements a comprehensive set of missing ONNX operators required for training and inference graphs. The changes update the SimOp infrastructure to support a new factory pattern, add over 15 new operator implementations, and resolve all functional and static analysis failures in the check-in test suite.

Key Changes

  • SimOp Factory Refactor: Restructured SimOpFactory to explicitly map operator types to their classes and moved factory definition to resolve forward reference issues
  • Missing Operators: Implemented 15+ operators including FastGelu, SoftPlus, HardSwish, Gemm, FusedMatMul, SoftmaxCrossEntropyLoss, and various reduction/gradient operators
  • Test Infrastructure: Updated ~40 test files to use SimOpFactory instead of base class instantiation, added comprehensive test coverage for new operators
  • Static Analysis Fixes: Enabled PEP 695 type alias support in MyPy configuration and fixed type inference issues across the codebase

Reviewed changes

Copilot reviewed 54 out of 62 changed files in this pull request and generated no comments.

Show a summary per file
File Description
tests/test_ops/test_space_to_depth.py New comprehensive test suite for SpaceToDepth operation
tests/test_ops/test_softsign.py New comprehensive test suite for SoftSign activation function
tests/test_ops/test_softplus.py New comprehensive test suite for SoftPlus activation function
tests/test_ops/test_skip_layer_normalization.py New test suite for SkipLayerNormalization transformer operation
tests/test_ops/test_simplified_layer_normalization.py New test suite for SimplifiedLayerNormalization operation
tests/test_ops/test_shrink.py New test suite for Shrink activation function
tests/test_ops/test_selu.py New test suite for SELU activation function
tests/test_ops/test_scatter_nd.py New test suite for ScatterND operation with reduction modes
tests/test_ops/test_scatter_elements.py New test suite for ScatterElements operation
tests/test_ops/test_rotary_position_embedding.py New test suite for RoPE used in modern transformer architectures
tests/test_ops/test_rms_normalization.py New test suite for RMSNormalization operation
tests/test_ops/test_reshape_ext.py New test suite for extended reshape functionality
tests/test_ops/test_reductions.py Updated to use SimOpFactory instead of base SimOp class
tests/test_ops/test_reduce_*.py New test suites for ReduceProd, ReduceMin, ReduceMean operations
tests/test_ops/test_quantize_linear.py New test suite for quantization operation
tests/test_ops/test_qlinear_matmul.py New test suite for quantized matrix multiplication
tests/test_ops/test_qattention.py New test suite for quantized attention operation
tests/test_ops/test_pad.py New test suite for Pad operation
pyproject.toml Updated MyPy configuration to support PEP 695 and reduce false positives
data/metal/inf/15oct25/*.yaml Added new performance metrics for vision and NLP models

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

ttsim/ops/op.py Outdated
clone = itensor
return clone

def build_tmp_data_tensor(data, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we redefine it? It is already defined in ops/desc/helpers.py

Copy link
Contributor

@ramamalladiTT ramamalladiTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this PR?

@ssapreTT ssapreTT marked this pull request as draft December 2, 2025 14:16
@ssapreTT
Copy link
Contributor

ssapreTT commented Dec 2, 2025

@smistTT The operators in ops/op.py have been significantly refactored a month ago. When I rebased your earlier branch on main, that was the big change I had incorporated.

In your current op.py is following the old structure, and needs to be revised significantly.

I have marked this PR as draft.

@ramamalladiTT
Copy link
Contributor

@smistTT The operators in ops/op.py have been significantly refactored a month ago. When I rebased your earlier branch on main, that was the big change I had incorporated.

In your current op.py is following the old structure, and needs to be revised significantly.

Is this PR still valid or we should close it? What is the value? Thanks

@ssapreTT
Copy link
Contributor

ssapreTT commented Dec 2, 2025

Shalva needs this PR for BevFormer

@ramamalladiTT
Copy link
Contributor

Shalva needs this PR for BevFormer

I suggest BevFormer PR be raised and that be rebased with main. The needed operators for BevFormer can added to that PR.

@smistTT
Copy link
Contributor Author

smistTT commented Dec 3, 2025

I have updated the repo to leverage the new SimOp.
I think having support for additional operations by default will enhance Polaris & will enable the team to ramp up model quickly. I don't mind removing the PR, but as mentioned by Shreeniwas my BevFormer implementation required some of hte new ops.

@ramamalladiTT
Copy link
Contributor

I have updated the repo to leverage the new SimOp. I think having support for additional operations by default will enhance Polaris & will enable the team to ramp up model quickly. I don't mind removing the PR, but as mentioned by Shreeniwas my BevFormer implementation required some of hte new ops.

Yes, whichever new ops you need, please add them. Good that you have updated the PR to use new SimOp structure. I will review it today. Can you squash to 1 commit? Thanks

@ramamalladiTT
Copy link
Contributor

Rebase to main. Redo this PR since it is stale (defines SimOpFactory and redefines Ops in ops/op.py).

@smistTT smistTT force-pushed the onnx-rebase-from-main branch 3 times, most recently from d46fa00 to a0b8cc2 Compare December 5, 2025 04:03
Copy link
Contributor

@ssapreTT ssapreTT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. There are conflicts. Please rebase and resolve the conflicts.
  2. Squash these into preferably a single commit; Or maybe 2 or 3.

@smistTT smistTT force-pushed the onnx-rebase-from-main branch from a0b8cc2 to e1a0aac Compare December 10, 2025 05:38
This commit squashes 21 commits to implement ~40 ONNX operators, add comprehensive tests, and refactor operation descriptors.

Contributors: Samvit Kaul, Shreeniwas N Sapre, smistTT
@smistTT smistTT force-pushed the onnx-rebase-from-main branch from e1a0aac to a7353d7 Compare December 11, 2025 00:22
@ssapreTT
Copy link
Contributor

ssapreTT commented Dec 18, 2025

@smistTT I compared the outputs of existing workloads before and after this change. I noticed that cycles change for Add, Hardswish and Matmul operators. From the changes you have made, are these changes expected?

@smistTT
Copy link
Contributor Author

smistTT commented Dec 19, 2025

Hi @ssapreTT ,

I ran a few models comparing the main branch and this PR - here are the results -

Workload Main Cycles Branch Cycles Total Delta Status
ViT B16 2,154,389 2,130,014 -24,375 Improved (-1.13%)
BERT Base 163,103 162,937 -166 Improved (-0.10%)
BERT Large 427,630 427,410 -220 Improved (-0.05%)
ResNet50 360,909 360,909 0 ✅ Match
YOLOv7 947,923 947,923 0 ✅ Match
YOLOv7 Tiny 113,319 113,319 0 ✅ Match
YOLOv8s 302,940 302,940 0 ✅ Match
UNet 980,944 980,944 0 ✅ Match
BEVDepth 324,432 324,432 0 ✅ Match
GPT Nano 1,532 1,532 0 ✅ Match
Llama2 5,655 5,655 0 ✅ Match

The main reason for the different is -

  1. Optimization of Fused Operations (Affects BERT)
    The new code calculates execution time more efficiently for "fused" blocks of instructions (sequences of operations joined together). for example in BERT models, the Feed-Forward Network (FFN) block—specifically the sequence involving Add, LayerNorm, and Gelu—is now estimated to run slightly faster.

  2. Tuning of Matrix Multiplication (Affects ViT) - Might be too optimistic ???
    The performance descriptor for MatMul (Matrix Multiplication) was improved for specific large tensor shapes. more specifically -

  • Before (Main Branch): The simulator likely used a more generic or pessimistic efficiency factor for certain tensor shapes, possibly treating the large ViT matrices (e.g., 197x768 x 768x2304) with a standard utilization rate.
  • After (New Branch): The refactored descriptors often include more precise logic for "reduced dimensions" and broadcasting. For the specific shapes found in ViT-B/16 (batch size 8, sequence length 197, hidden size 768), the new logic calculates a slightly lower cycle count per operation, reflecting better expected hardware utilization (tiling/blocking).

Did you observe the same differences in perf projections ?

@ssapreTT
Copy link
Contributor

@smistTT Could you rebase this PR on the current main? It seems to be out of date.

@lyen1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants