Incorrect ReduceLogSumExp overflow of TensorRT 10.16.1.11 when running ONNX ReduceLogSumExp on GPU

## Description

TensorRT appears to overflow for ONNX `ReduceLogSumExp` on large but finite float32 inputs.

ONNX Runtime returns a finite result, while TensorRT returns `inf` for the same model and input. This suggests TensorRT may be computing `log(sum(exp(x)))` directly without a numerically stable max-subtraction implementation.

This appears to be a TensorRT numerical stability issue for ONNX `ReduceLogSumExp`.

## Environment

**TensorRT Version**: 10.16.1.11

**NVIDIA GPU**: N/A / not detected by `nvidia-smi`

**NVIDIA Driver Version**: N/A / `nvidia-smi` failed

**CUDA Version**: N/A / `nvcc` not found

**CUDNN Version**: N/A / `torch.backends.cudnn.version()` returned `None`


Operating System: Linux 6.17.0-20-generic x86_64, glibc 2.39

Python Version (if applicable): Python 3.11.15

Tensorflow Version (if applicable): N/A

PyTorch Version (if applicable): N/A

Baremetal or Container (if so, version): Baremetal / non-Docker environment (`/proc/1/cgroup`: `0::/init.scope`)

Additional package versions:

ONNX Version: 1.21.0  
ONNX Runtime Version: 1.25.1

## Relevant Files

**Model link**: N/A

The ONNX model is generated inline by the minimal reproducible script below.

## Steps To Reproduce

**Commands or scripts**:

```
import numpy as np
import onnx
import onnxruntime as ort
from onnx import helper, TensorProto
from _trt_helper import build_engine_from_onnx, run_engine

n = helper.make_node("ReduceLogSumExp", ["x"], ["y"], keepdims=0)
g = helper.make_graph(
    [n],
    "g",
    [helper.make_tensor_value_info("x", TensorProto.FLOAT, [4])],
    [helper.make_tensor_value_info("y", TensorProto.FLOAT, [])],
)

m = helper.make_model(g, opset_imports=[helper.make_opsetid("", 18)])
m.ir_version = 10
ob = m.SerializeToString()

x = np.array([250.0, 248.0, 255.0, 251.0], dtype=np.float32)

ort_y = float(
    ort.InferenceSession(
        ob,
        providers=["CPUExecutionProvider"],
    ).run(["y"], {"x": x})[0]
)

eng, _ = build_engine_from_onnx(ob)
trt_y = float(
    run_engine(
        eng,
        {"x": x},
        ["y"],
        [()],
        [np.float32],
    )["y"]
)

print("ORT:", ort_y)
print("TRT:", trt_y)

assert np.isfinite(ort_y) and not np.isfinite(trt_y)
```

**Have you tried [the latest release](https://developer.nvidia.com/tensorrt)?**: Yes, reproduced with TensorRT 10.16.1.11.

**Attach the captured .json and .bin files from [TensorRT's API Capture tool](https://docs.nvidia.com/deeplearning/tensorrt/latest/inference-library/capture-replay.html) if you're on an x86_64 Unix system** Not attached. The issue is reproducible from the self-contained Python script above.

**Can this model run on other frameworks?** For example run ONNX model with ONNXRuntime (`polygraphy run <model.onnx> --onnxrt`): For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

Yes. ONNX Runtime runs the same model and returns a finite result.

## Actual output:

```
ORT: 255.025634765625
TRT: inf
```

TensorRT returns inf even though the mathematically expected result is finite.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect ReduceLogSumExp overflow of TensorRT 10.16.1.11 when running ONNX ReduceLogSumExp on GPU #4772

Description

Environment

Relevant Files

Steps To Reproduce

Actual output:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incorrect ReduceLogSumExp overflow of TensorRT 10.16.1.11 when running ONNX ReduceLogSumExp on GPU #4772

Description

Description

Environment

Relevant Files

Steps To Reproduce

Actual output:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions