Enabling of the Performance model for NNPA / z17 by AlexandreEichenberger · Pull Request #3383 · onnx/onnx-mlir

AlexandreEichenberger · 2026-02-04T19:33:15Z

In Granite embedding, I noticed that we are sending a scalar SQRT to NNPA because the current policy is to send any legal operations to the NNPA.

After adding support to missing new operations in z17 (sqrt & leaky relu), I noticed that some patterns transforming ONNX ops to ZHigh ops were firing without considerations for placement (aka the pass that determines where ops should run, NNPA vs CPU). So I added some additional conditions to rules for the ONNX -> ZHigh transition.

I also modified a bit the handling of dynamic dimensions expected values when plugged into the performance models for CPU and NNPA.

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

…ed as faster than CPU) Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

AlexandreEichenberger · 2026-02-10T15:45:31Z

Disabled the default cost benefit model for the check-backend-.*-nnpa tests and the numerical-nnpa tests as we want to send all qualifying ops to NNPA in these tests.

@tungld let me know if there are others. Thanks

AlexandreEichenberger · 2026-02-10T19:19:19Z

Also ran granite 3 and granite embedding with/without the new placement heuristics and verified that we have no regressions. It made a small change for embedding (assigned a few scalar ops to CPU instead of NNPA, negligible perf difference but better use of resources, aka not using NNPA for a scalar).

tungld · 2026-02-13T01:57:33Z

@AlexandreEichenberger

Disabled the default cost benefit model for the check-backend-.*-nnpa tests and the numerical-nnpa tests as we want to send all qualifying ops to NNPA in these tests.

I’m a bit hesitant about disabling the default cost model. The current default is simple, which makes it useful for debugging and pinpointing issues. Unless there is a clear benefit to switching to another cost model (e.g., measurable performance improvement or reduced memory usage), I would prefer to keep the current one as the default and suggest that users opt into more advanced models only when necessary.

Regarding the scalar sqrt issue, I am investigating whether I can make it work with minimal changes to the current cost model.

AlexandreEichenberger · 2026-02-13T03:38:58Z

Fair point... note that currently we have NO cost model... I could split this PR into 2, one that add the new support (which should not be controversial), and one that switch the default policy...

While I agree that sending a scalar to zAIU or not is not a performance issue (both are very fast because they are small), I still see this as waiting hardware to send a scalar to a zAIU. So maybe we can augment the default policy to disable NNPA for operations with fewer than X (say 100) data points.

tungld · 2026-02-13T05:59:45Z

@AlexandreEichenberger I created PR #3393 to not send scalar ops to NNPA.

AlexandreEichenberger · 2026-02-13T15:50:13Z

@tungld FYI I recall that it got 10% on a fraud model that had small vector sizes when enabling the cost models.

…nberger/onnx-mlir into missing-model-z17

tungld · 2026-02-16T01:09:39Z

@tungld FYI I recall that it got 10% on a fraud model that had small vector sizes when enabling the cost models.

Thanks. That looks good since the model has only one stickification for input and one unstickification for output but the cost model can find a better one. Could you add a lit test for this model?

tungld

LGTM!

Could you change all the lit tests to use your new default cost model instead of adding --nnpa-placement-heuristic=QualifyingOps? It's because --nnpa-placement-heuristic=QualifyingOps is not the default one with your PR, testing with it cannot capture the real onnx-mlir command. We would have lit tests for the default setting.

Also, could you please add lit tests for boundary, saying with size XXX it runs on CPU, with size XXX+1 it runs on NNPA? I think the boundaries for z16 and z17 are perhaps different. Also, they are also different when using vs not using multiple-threads.

Sorry for many requests.

AlexandreEichenberger added 24 commits February 2, 2026 15:43

adding sqrt and leaky relu to perf model

391424b

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

added logic to gate onnx to zhigh rules to take friendlyness to NNPA

a4ebc46

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

added more patterns

dc5c85f

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fliped default from QualifyingOps (no model) to FasterOps (NNPA model…

6a9b3f0

…ed as faster than CPU) Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fix

aba42d4

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

tweaked handling of unknown dims for matmul perf model

97f7c96

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

format

b289daf

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fixed lit tests

df128a7

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

c712d1d

tuned heuristic

74f9341

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

5261283

fix some litests

a0f82e5

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fix litest and code to support config file

ad5df35

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fix littests

f27997b

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

97ce5a7

fixes

ff8dd4f

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fixes

7b70066

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fix

d5e9364

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

test

ca2be53

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fix

13422fe

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

fixes

03858a9

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

added default QualifyingOp heuristic for backend tests

ae5feec

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

all ops to NNPA in numerical tests too

75424b1

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>

update

28e8f8c

Merge branch 'main' into missing-model-z17

8ccb7b1

AlexandreEichenberger requested a review from tungld February 10, 2026 19:17

AlexandreEichenberger added 2 commits February 11, 2026 13:09

Merge branch 'main' into missing-model-z17

1e43122

Merge branch 'main' into missing-model-z17

fbcbf88

update

1767ee7

Merge branch 'missing-model-z17' of https://github.com/AlexandreEiche…

4f0de7e

…nberger/onnx-mlir into missing-model-z17

AlexandreEichenberger mentioned this pull request Feb 13, 2026

Update of cost model for z17 [No change to default policy] #3396

Open

tungld approved these changes Feb 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enabling of the Performance model for NNPA / z17#3383

Enabling of the Performance model for NNPA / z17#3383
AlexandreEichenberger wants to merge 29 commits intoonnx:mainfrom
AlexandreEichenberger:missing-model-z17

AlexandreEichenberger commented Feb 4, 2026

Uh oh!

AlexandreEichenberger commented Feb 10, 2026 •

edited

Loading

Uh oh!

AlexandreEichenberger commented Feb 10, 2026

Uh oh!

tungld commented Feb 13, 2026

Uh oh!

AlexandreEichenberger commented Feb 13, 2026

Uh oh!

tungld commented Feb 13, 2026

Uh oh!

AlexandreEichenberger commented Feb 13, 2026

Uh oh!

tungld commented Feb 16, 2026

Uh oh!

tungld left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AlexandreEichenberger commented Feb 4, 2026

Uh oh!

AlexandreEichenberger commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AlexandreEichenberger commented Feb 10, 2026

Uh oh!

tungld commented Feb 13, 2026

Uh oh!

AlexandreEichenberger commented Feb 13, 2026

Uh oh!

tungld commented Feb 13, 2026

Uh oh!

AlexandreEichenberger commented Feb 13, 2026

Uh oh!

tungld commented Feb 16, 2026

Uh oh!

tungld left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AlexandreEichenberger commented Feb 10, 2026 •

edited

Loading