Skip to content

Fix ch_axis bounds check in fused_moving_avg_obs_fake_quant_xpu#3964

Open
Copilot wants to merge 3 commits into
mainfrom
copilot/bound-check-fused-moving-avg
Open

Fix ch_axis bounds check in fused_moving_avg_obs_fake_quant_xpu#3964
Copilot wants to merge 3 commits into
mainfrom
copilot/bound-check-fused-moving-avg

Conversation

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

fused_moving_avg_obs_fake_quant_xpu only validated the upper bound of ch_axis (ch_axis < x.dim()), so any negative value passed the check and was used as a raw index into a DimVector, causing an out-of-bounds access and segfault reachable from public Python APIs.

  • Added full range validation: ch_axis >= -x_dim && ch_axis < x_dim
  • Canonicalize negative values via dimension wrapping (e.g. -1 → last dim), matching standard PyTorch convention used throughout the codebase
  • All downstream uses of ch_axis in the function now use the wrapped value
const auto x_dim = x.dim();
TORCH_CHECK(
    ch_axis >= -x_dim && ch_axis < x_dim,
    "Error in fused_moving_avg_obs_fq_helper: ch_axis ",
    ch_axis,
    " is out of range for tensor with ",
    x_dim,
    " dimensions");
const auto wrapped_ch_axis =
    ch_axis < 0 ? ch_axis + x_dim : ch_axis;

Note: the upstream CUDA implementation (aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu) has the same bug.

Copilot AI linked an issue Jun 11, 2026 that may be closed by this pull request
Copilot AI changed the title [WIP] Fix bound check in fused_moving_avg_obs_fake_quant_xpu Fix ch_axis bounds check in fused_moving_avg_obs_fake_quant_xpu Jun 11, 2026
Copilot AI requested a review from chuanqi129 June 11, 2026 02:03
@chuanqi129

Copy link
Copy Markdown
Contributor

@copilot please use pr-create skill to update the PR

@chuanqi129

Copy link
Copy Markdown
Contributor

@copilot The Python lint (flake8/ruff) and Clang format check failed. Please fix the lint errors in this PR.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Lint job log: https://github.com/intel/torch-xpu-ops/actions/runs/27319132554

Instructions:

  1. Review the lint failure log via the link above
  2. Fix all lint errors in the relevant source files
  3. Ensure both Python lint and Clang format checks pass
  4. Use reply comments to report progress - do NOT modify the PR description plan directly

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot please use pr-create skill to update the PR

Updated the PR following the pr-creation skill:

  • Added reproducer test at test/repro/test_fused_obs_fake_quant_ch_axis_bounds.py covering out-of-range negative, valid negative (-1 wrapping), and out-of-range positive ch_axis values
  • Updated PR body with the required format including issue link, CUDA comparison notes, and Test: line

Commit: see latest push.

@github-actions github-actions Bot added disable_e2e Disable all e2e test jobs for the PR disable_distributed Disable distributed UT test jobs for the PR labels Jun 11, 2026

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot The Python lint (flake8/ruff) and Clang format check failed. Please fix the lint errors in this PR.

**Important: Do NOT update the plan in the PR description directly. Use reply comments to ...

Fixed all lint errors:

  1. CLANGFORMAT: Collapsed wrapped_ch_axis assignment to single line in FusedObsFakeQuant.cpp
  2. TESTOWNERS: Added # Owner(s): ["module: intel"] header to test file
  3. TEST_HAS_MAIN: Converted from bare pytest functions to TestCase class with run_tests() main block
  4. PYFMT: Removed extra blank line between imports

See commit Fix lint: clang-format, test ownership header, and main block.

@chuanqi129 chuanqi129 marked this pull request as ready for review June 11, 2026 08:52
@chuanqi129

Copy link
Copy Markdown
Contributor

@copilot please review this PR in deep

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot please review this PR in deep

PR Review: Fix ch_axis bounds check in fused_moving_avg_obs_fake_quant_xpu

Summary

Adds lower-bound validation and negative-index wrapping for ch_axis in fused_moving_avg_obs_fake_quant_xpu, preventing a segfault from out-of-bounds DimVector access reachable via public Python APIs. Overall this is a correct, well-scoped fix.

Correctness

maybe_wrap_dim vs manual wrapping — The standard PyTorch pattern for dimension canonicalization is at::maybe_wrap_dim(ch_axis, x.dim()), which handles range validation and wrapping in one call and is used extensively throughout the codebase (e.g., Sorting.cpp, Indexing.cpp, TensorTopK.cpp, SoftMaxKernels.cpp all in this repo). The manual TORCH_CHECK + ternary is functionally equivalent but diverges from the established convention. Consider replacing lines 42-50 with:

const auto wrapped_ch_axis = at::maybe_wrap_dim(ch_axis, x.dim());

This produces the same range check and wrapping behavior, provides a consistent error message format, and correctly handles 0-dim tensors (rejects all axes with wrap_scalar=false default). It also reduces the code delta. However, the current manual approach is not wrong — this is a "should fix" for consistency, not a correctness bug.

Upstream CUDA parity — Verified. The upstream CUDA implementation (aten/src/ATen/native/quantized/cuda/FusedObsFakeQuant.cu) has the identical incomplete check (ch_axis < x.dim() only, no lower bound, no wrapping). This PR intentionally improves on the upstream, which is appropriate for a security/crash fix. The PR description correctly notes this.

0-dim tensor edge case — The manual check creates an empty valid range [0, 0) for 0-dim tensors, correctly rejecting all ch_axis values. This matches maybe_wrap_dim(wrap_scalar=false) semantics. No issue here.

Testing

Missing negative wrapping equivalence assertiontest_negative_one_ch_axis_wraps only checks that ch_axis=-1 doesn't raise, but doesn't verify it produces the same result as ch_axis=2 (the last dim for a shape [2,3,4] tensor). Adding an assertEqual between the two outputs would confirm the wrapping logic is correct end-to-end, not just that it doesn't crash.

Missing boundary value -x.dim() test — The test covers -1 (valid) and -1250999896764 (invalid), but doesn't test ch_axis=-3 which is the minimum valid value for a 3-dim tensor. This boundary is where off-by-one errors in the range check would surface.

Backward Compatibility

The original code only accepted non-negative ch_axis < x.dim(). This PR now also accepts negative values via wrapping. This is strictly additive — previously valid inputs produce the same behavior, and previously crashing inputs now either work (valid negative) or raise a clear error (invalid negative). No BC concern.

Recommendation

Approve — The fix is correct, well-scoped, and addresses a real crash. The suggestion to use maybe_wrap_dim and the test coverage gaps are improvements but don't block merging.

@chuanqi129

Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in basic tests (0 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/27335414788
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/27335414788/artifacts/7563662388

New failures:

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot New UT failures detected in basic tests (0 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

Analyzed the CI failures:

  1. Windows build — Build infrastructure failure (dependency resolution), not related to PR changes.
  2. test_compare_cpu_where_xpu_bfloat16 — This tests the where operator with bfloat16 dtype. This PR only modifies src/ATen/native/quantized/FusedObsFakeQuant.cpp (bounds check for ch_axis) and adds a new test file. The failing test is completely unrelated to quantization or ch_axis. This is a pre-existing/flaky failure, not caused by this PR.

@chuanqi129

Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in op_ut tests (2 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/27335414788
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/27335414788/artifacts/7568712799

New failures:

Category Class name Test name Status Message
op_ut third_party.torch-xpu-ops.test.xpu.export.test_passes_xpu.TestPasses test_views_op_having_view_copy failed AttributeError: '_OpNamespace' 'aten' object has no attribute 'cholesky'
op_ut third_party.torch-xpu-ops.test.xpu.test_decomp_xpu.HasDecompTest test_has_decomposition failed AttributeError: '_OpNamespace' 'aten' object has no attribute 'cholesky'

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copilot AI commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

@copilot New UT failures detected in op_ut tests (2 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

Both failures error with AttributeError: '_OpNamespace' 'aten' object has no attribute 'cholesky' — the cholesky op is missing from the ATen namespace. This PR only modifies src/ATen/native/quantized/FusedObsFakeQuant.cpp and adds a test file. These failures are unrelated to quantization or ch_axis bounds checking and are not caused by this PR.

@chuanqi129 chuanqi129 force-pushed the copilot/bound-check-fused-moving-avg branch from 3747880 to f9b8e71 Compare June 12, 2026 03:12
@chuanqi129

Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in basic tests (0 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/27392036929
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/27392036929/artifacts/7583477959

New failures:

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copilot AI commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@copilot New UT failures detected in basic tests (0 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

The only failure is test_compare_cpu_where_xpu_bfloat16 (same as the previous run) — a pre-existing where operator test unrelated to this PR's quantization changes. No new failures caused by this PR.

@chuanqi129

Copy link
Copy Markdown
Contributor

@copilot New UT failures detected in op_ut tests (3 total). Please check these new failures and analyze whether they are caused by the PR changes.

Important: Do NOT update the plan in the PR description directly. Use reply comments to update the status.

Job log: https://github.com/intel/torch-xpu-ops/actions/runs/27392036929
Full new failure report: https://github.com/intel/torch-xpu-ops/actions/runs/27392036929/artifacts/7590411935

New failures:

Category Class name Test name Status Message
op_ut third_party.torch-xpu-ops.test.xpu.dynamo.test_wrap_inductor_compiled_regions_xpu.TestWrapInductorCompiledRegions test_wrap_with_cache failed OSError: [Errno 39] Directory not empty: '/tmp/torchinductor_root/fxgraph'
op_ut third_party.torch-xpu-ops.test.xpu.dynamo.test_wrap_inductor_compiled_regions_xpu.TestWrapInductorCompiledRegions test_wrap_config_affects_cache_key failed AssertionError: Scalars are not equal!
op_ut third_party.torch-xpu-ops.test.xpu.export.test_retraceability_xpu.RetraceExportTestExport test_dynamic_lstm_retraceability_strict failed torch._dynamo.exc.UserError: Constraints violated (L['x'].size()[0])! For more information, run with TORCH_LOGS="+dynamic".

Instructions:

  1. Download the full new failure report artifact for complete error messages
  2. Analyze whether the failures are caused by PR changes
  3. If caused by PR changes, fix those failures on this PR branch directly
  4. Add failing test cases with pytest commands in the PR description for the reproducer job to use
  5. Use reply comments to report analysis status - do NOT modify the PR description plan directly

Copilot AI commented Jun 12, 2026

Copy link
Copy Markdown
Contributor Author

@copilot New UT failures detected in op_ut tests (3 total). Please check these new failures and analyze whether they are caused by the PR changes.

**Important: Do NOT update the plan in the PR de...

All three failures are in dynamo/inductor/export tests unrelated to this PR:

  1. test_wrap_with_cache — torchinductor cache directory cleanup issue (OSError: Directory not empty)
  2. test_wrap_config_affects_cache_key — inductor cache key assertion failure
  3. test_dynamic_lstm_retraceability_strict — dynamo constraint violation in export retracing

This PR only modifies src/ATen/native/quantized/FusedObsFakeQuant.cpp (bounds check) and adds a test file. None of these failures are caused by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ai_generated disable_distributed Disable distributed UT test jobs for the PR disable_e2e Disable all e2e test jobs for the PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bound check in fused_moving_avg_obs_fake_quant_xpu

2 participants