Skip to content

[BMG] test_tril got UR_RESULT_ERROR_DEVICE_LOST #2757

@mengfei25

Description

@mengfei25

🐛 Describe the bug

Once test_tril got UR_RESULT_ERROR_DEVICE_LOST and all the next cases will be failed too.

Cases:
op_regression,third_party.torch-xpu-ops.test.regressions.test_tril.TestSimpleBinary,test_tril

__________________________ TestSimpleBinary.test_tril __________________________
[gw7] linux -- Python 3.12.12 /__w/torch-xpu-ops/torch-xpu-ops/.venv/bin/python
Traceback (most recent call last):
  File "/__w/torch-xpu-ops/torch-xpu-ops/pytorch/third_party/torch-xpu-ops/test/regressions/test_tril.py", line 13, in test_tril
    torch.xpu.synchronize()
  File "/__w/torch-xpu-ops/torch-xpu-ops/.venv/lib/python3.12/site-packages/torch/xpu/__init__.py", line 451, in synchronize
    return torch._C._xpu_synchronize(device)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)

To execute this test, run the following from the base repo dir:
    python test/regressions/test_tril.py TestSimpleBinary.test_tril

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Versions

BMG B60

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions