Skip to content

Incorrect ReduceLogSumExp overflow of TensorRT 10.16.1.11 when running ONNX ReduceLogSumExp on GPU #4772

@ALinrunrun

Description

@ALinrunrun

Description

TensorRT appears to overflow for ONNX ReduceLogSumExp on large but finite float32 inputs.

ONNX Runtime returns a finite result, while TensorRT returns inf for the same model and input. This suggests TensorRT may be computing log(sum(exp(x))) directly without a numerically stable max-subtraction implementation.

This appears to be a TensorRT numerical stability issue for ONNX ReduceLogSumExp.

Environment

TensorRT Version: 10.16.1.11

NVIDIA GPU: N/A / not detected by nvidia-smi

NVIDIA Driver Version: N/A / nvidia-smi failed

CUDA Version: N/A / nvcc not found

CUDNN Version: N/A / torch.backends.cudnn.version() returned None

Operating System: Linux 6.17.0-20-generic x86_64, glibc 2.39

Python Version (if applicable): Python 3.11.15

Tensorflow Version (if applicable): N/A

PyTorch Version (if applicable): N/A

Baremetal or Container (if so, version): Baremetal / non-Docker environment (/proc/1/cgroup: 0::/init.scope)

Additional package versions:

ONNX Version: 1.21.0
ONNX Runtime Version: 1.25.1

Relevant Files

Model link: N/A

The ONNX model is generated inline by the minimal reproducible script below.

Steps To Reproduce

Commands or scripts:

import numpy as np
import onnx
import onnxruntime as ort
from onnx import helper, TensorProto
from _trt_helper import build_engine_from_onnx, run_engine

n = helper.make_node("ReduceLogSumExp", ["x"], ["y"], keepdims=0)
g = helper.make_graph(
    [n],
    "g",
    [helper.make_tensor_value_info("x", TensorProto.FLOAT, [4])],
    [helper.make_tensor_value_info("y", TensorProto.FLOAT, [])],
)

m = helper.make_model(g, opset_imports=[helper.make_opsetid("", 18)])
m.ir_version = 10
ob = m.SerializeToString()

x = np.array([250.0, 248.0, 255.0, 251.0], dtype=np.float32)

ort_y = float(
    ort.InferenceSession(
        ob,
        providers=["CPUExecutionProvider"],
    ).run(["y"], {"x": x})[0]
)

eng, _ = build_engine_from_onnx(ob)
trt_y = float(
    run_engine(
        eng,
        {"x": x},
        ["y"],
        [()],
        [np.float32],
    )["y"]
)

print("ORT:", ort_y)
print("TRT:", trt_y)

assert np.isfinite(ort_y) and not np.isfinite(trt_y)

Have you tried the latest release?: Yes, reproduced with TensorRT 10.16.1.11.

Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system Not attached. The issue is reproducible from the self-contained Python script above.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Yes. ONNX Runtime runs the same model and returns a finite result.

Actual output:

ORT: 255.025634765625
TRT: inf

TensorRT returns inf even though the mathematically expected result is finite.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:ONNXIssues relating to ONNX usage and import

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions