Skip to content

Inference Result Discrepancy Caused by Explicit Graph Optimization Level in ONNX Runtime #23284

@Cookiee235

Description

@Cookiee235

This is an issue for tracking the confirmed bug in VULN-143221

Description

A vulnerability exists in ONNXRuntime (tested on version 1.19.2 & 1.20.1), where setting an explicit graph_optimization_level in onnxruntime.SessionOptions can cause significant inconsistencies in inference results. When running the same model with identical inputs, the outputs differ depending on whether the optimization level is explicitly configured (e.g., ORT_DISABLE_ALL) or left as the default.

This behavior undermines the reliability of ONNX Runtime, particularly in scenarios where consistent outputs are critical for model validation, deployment, and production environments.

Reproduction steps

  • Step 1. Download the ONNX model (i.e.., model3.onnx) with crafted structures in this link.

  • Step 2. Run inference twice with the same input data:
    Once with default settings (no explicit graph_optimization_level).
    Once with sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL.

  • Step 3.
    Compare the results using np.testing.assert_allclose. The outputs may differ significantly, triggering an exception if the discrepancy exceeds the specified tolerance (atol=1e-3, rtol=1e-3).

import onnxruntime as ort
import numpy as np


def test_graph_optimization_discrepancy(model_path):
    input_data = {"v10_0": np.random.rand(60).astype(np.float16)}

    # Default session
    session1 = ort.InferenceSession(model_path)
    output_names = [output.name for output in session1.get_outputs()]
    results1 = session1.run(output_names, input_data)

    # Session with explicitly disabled graph optimization
    sess_options = ort.SessionOptions()
    sess_options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
    session2 = ort.InferenceSession(model_path, sess_options)
    results2 = session2.run(output_names, input_data)

    # Compare results
    for r1, r2 in zip(results1, results2):
        np.testing.assert_allclose(r1, r2, atol=1e-3, rtol=1e-3)

test_graph_optimization_discrepancy("model3.onnx")

Callstack


Traceback (most recent call last):
  File "D:/code/python/OPTFuzz/ONNX/bugs/bug3.py", line 24, in <module>
    test_graph_optimization_discrepancy("model3.onnx")
  File "D:/code/python/OPTFuzz/ONNX/bugs/bug3.py", line 21, in test_graph_optimization_discrepancy
    np.testing.assert_allclose(r1, r2, atol=1e-3, rtol=1e-3)
  File "C:\software\conda\envs\OPTFuzz\lib\site-packages\numpy\testing\_private\utils.py", line 1592, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "C:\software\conda\envs\OPTFuzz\lib\contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "C:\software\conda\envs\OPTFuzz\lib\site-packages\numpy\testing\_private\utils.py", line 862, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.001, atol=0.001

Mismatched elements: 95 / 400 (23.8%)
Max absolute difference: 0.9433594
Max relative difference: 99.145134
 x: array([[[[ 6.557533e+00,  5.981650e+00,  7.282983e+00,  6.907324e+00,
           8.025227e+00],
         [ 7.683022e+00,  6.642226e+00,  6.947234e+00,  5.576146e+00,...
 y: array([[[[ 6.557533e+00,  6.666800e+00,  7.299584e+00,  6.799902e+00,
           7.440510e+00],
         [ 7.629311e+00,  6.917006e+00,  7.392470e+00,  5.630254e+00,...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions