Inference Result Discrepancy Caused by Explicit Graph Optimization Level in ONNX Runtime

This is an issue for tracking the confirmed bug in [VULN-143221](https://msrc.microsoft.com/report/vulnerability/VULN-143221)



## Description
A vulnerability exists in ONNXRuntime (tested on version 1.19.2 & 1.20.1), where setting an explicit graph_optimization_level in onnxruntime.SessionOptions can cause significant inconsistencies in inference results. When running the same model with identical inputs, the outputs differ depending on whether the optimization level is explicitly configured (e.g., ORT_DISABLE_ALL) or left as the default.

This behavior undermines the reliability of ONNX Runtime, particularly in scenarios where consistent outputs are critical for model validation, deployment, and production environments.

#### Reproduction steps

* Step 1. Download the ONNX model (i.e.., model3.onnx) with crafted structures in this [link](https://github.com/Cookiee235/ModelZoo/blob/main/model3.onnx).

* Step 2. Run inference twice with the same input data:
Once with default settings (no explicit graph_optimization_level).
Once with sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL.

* Step 3.
Compare the results using np.testing.assert_allclose. The outputs may differ significantly, triggering an exception if the discrepancy exceeds the specified tolerance (atol=1e-3, rtol=1e-3).

```
import onnxruntime as ort
import numpy as np


def test_graph_optimization_discrepancy(model_path):
    input_data = {"v10_0": np.random.rand(60).astype(np.float16)}

    # Default session
    session1 = ort.InferenceSession(model_path)
    output_names = [output.name for output in session1.get_outputs()]
    results1 = session1.run(output_names, input_data)

    # Session with explicitly disabled graph optimization
    sess_options = ort.SessionOptions()
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
    session2 = ort.InferenceSession(model_path, sess_options)
    results2 = session2.run(output_names, input_data)

    # Compare results
    for r1, r2 in zip(results1, results2):
        np.testing.assert_allclose(r1, r2, atol=1e-3, rtol=1e-3)

test_graph_optimization_discrepancy("model3.onnx")
```

### Callstack
```

Traceback (most recent call last):
  File "D:/code/python/OPTFuzz/ONNX/bugs/bug3.py", line 24, in <module>
    test_graph_optimization_discrepancy("model3.onnx")
  File "D:/code/python/OPTFuzz/ONNX/bugs/bug3.py", line 21, in test_graph_optimization_discrepancy
    np.testing.assert_allclose(r1, r2, atol=1e-3, rtol=1e-3)
  File "C:\software\conda\envs\OPTFuzz\lib\site-packages\numpy\testing\_private\utils.py", line 1592, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "C:\software\conda\envs\OPTFuzz\lib\contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "C:\software\conda\envs\OPTFuzz\lib\site-packages\numpy\testing\_private\utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.001, atol=0.001

Mismatched elements: 95 / 400 (23.8%)
Max absolute difference: 0.9433594
Max relative difference: 99.145134
 x: array([[[[ 6.557533e+00,  5.981650e+00,  7.282983e+00,  6.907324e+00,
           8.025227e+00],
         [ 7.683022e+00,  6.642226e+00,  6.947234e+00,  5.576146e+00,...
 y: array([[[[ 6.557533e+00,  6.666800e+00,  7.299584e+00,  6.799902e+00,
           7.440510e+00],
         [ 7.629311e+00,  6.917006e+00,  7.392470e+00,  5.630254e+00,...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Result Discrepancy Caused by Explicit Graph Optimization Level in ONNX Runtime #23284

Description

Reproduction steps

Callstack

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference Result Discrepancy Caused by Explicit Graph Optimization Level in ONNX Runtime #23284

Description

Description

Reproduction steps

Callstack

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions