Skip to content

[NNPA] Do not lower ONNX ops whose inputs are scalar to ZHigh#3393

Open
tungld wants to merge 7 commits intoonnx:mainfrom
tungld:scalar-op-not-to-nnpa
Open

[NNPA] Do not lower ONNX ops whose inputs are scalar to ZHigh#3393
tungld wants to merge 7 commits intoonnx:mainfrom
tungld:scalar-op-not-to-nnpa

Conversation

@tungld
Copy link
Member

@tungld tungld commented Feb 13, 2026

This PR is to not lower ONNX ops whose inputs are scalar to ZHigh so that they run on CPU. To add more conditions rather than "scalar", there only needs to update the function isTooSmallOpForNNPA with more conditions.

Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Signed-off-by: Tung D. Le <tung@jp.ibm.com>
Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tungld don't you think we need a more generic solution than just checking the scalar size for 2 operations? I.e. I would extend it to all ops. Maybe in the "isLegal"...

@AlexandreEichenberger
Copy link
Collaborator

I made a new PR #3396 that has just the update of the cost/perf model (not changing the default), and it has more support to respect CPU/NNPA decisions when transforming ONNX to NNPA, as there were rules that ignored CPU device placement.

I think you can then update the QualifyingOps default policy by assigning scalar (or near scalar, eg with fewer than X data points) to CPU, and the lowering will then respect such decisions.

Signed-off-by: Tung D. Le <tung@jp.ibm.com>
@tungld
Copy link
Member Author

tungld commented Feb 16, 2026

@tungld don't you think we need a more generic solution than just checking the scalar size for 2 operations? I.e. I would extend it to all ops. Maybe in the "isLegal"...

This PR indeed supports all ops. I just added lit tests for all ops to show it clearly (I didn't add scalar tests for matmul, rnn, lstm, gru, conv though the code supports but I don't think we have matmul with scalar inputs).

@AlexandreEichenberger
Copy link
Collaborator

@tungld what I ment is that in OnnxToZHIgh.td, there are many more patterns that re-introduce ZHigh ops than just the sqrt and inv patterns. I added a mechanism for all of them in my PR already. If we continue using the existing pattern, namely to place non-beneficial (aka here scalar) ops by labeling them with the device=CPU, then my PR already cover these cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants