[Qwen3][MXFP8][LLMC]: RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)

### Problem Description

```
Loading checkpoint shards:   0%|          | 0/16 [00:00<?, ?it/s][A
Loading checkpoint shards: 100%|██████████| 16/16 [00:01<00:00, 15.41it/s]
[38;20m2026-05-21 09:56:26 INFO calib_dataset.py L977: Preprocessing calibration dataset in a subprocess to avoid memory leaks...[0m


Map:   0%|          | 0/10000 [00:00<?, ? examples/s][A
Map: 100%|██████████| 10000/10000 [00:09<00:00, 1036.44 examples/s]


Filter:   0%|          | 0/10000 [00:00<?, ? examples/s][A
Filter: 100%|██████████| 10000/10000 [00:03<00:00, 3028.36 examples/s]


Casting the dataset:   0%|          | 0/2301 [00:00<?, ? examples/s][A
Casting the dataset: 100%|██████████| 2301/2301 [00:03<00:00, 649.01 examples/s]
2026-05-21T09:56:46.2089 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
2026-05-21T09:56:47.5298 | reset | INFO - Compression lifecycle reset
2026-05-21T09:56:47.6216 | moe_calibration_context | INFO - Found 48 MoE modules to replace


Replacing MoE modules for calibration:   0%|          | 0/48 [00:00<?, ?it/s][A
Replacing MoE modules for calibration: 100%|██████████| 48/48 [00:00<00:00, 2018.92it/s]
2026-05-21T09:56:47.6464 | moe_calibration_context | INFO - Replaced 48 MoE modules for calibration
2026-05-21T09:56:47.6465 | moe_calibration_context | INFO - 48/48 modules will be restored after calibration
2026-05-21T09:56:47.6473 | from_modifiers | INFO - Creating recipe from modifiers
2026-05-21T09:56:52.1717 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
2026-05-21T09:56:52.1720 | IndependentPipeline | INFO - Inferred `SequentialPipeline` for `AutoRoundModifier`
W0521 09:56:53.964000 5819 torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile.


Preparing cache:   0%|          | 0/128 [00:00<?, ?it/s][A
Preparing cache: 100%|██████████| 128/128 [00:00<00:00, 3418.69it/s]


(1/49): Calibrating:   0%|          | 0/128 [00:00<?, ?it/s][A
(1/49): Calibrating: 100%|██████████| 128/128 [00:00<00:00, 199.70it/s]


(1/49): Propagating:   0%|          | 0/128 [00:00<?, ?it/s][A
(1/49): Propagating: 100%|██████████| 128/128 [00:00<00:00, 193.08it/s]


(2/49): Calibrating:   0%|          | 0/128 [00:00<?, ?it/s][A
(2/49): Calibrating: 100%|██████████| 128/128 [00:11<00:00, 10.86it/s]
2026-05-21T09:57:07.7937 | apply_autoround | INFO - Applying AutoRound on layer model.layers.0
[33;1m2026-05-21 09:57:08 WARNING logging.py L328: Using LLM mode (new architecture).[0m
[38;20m2026-05-21 09:57:08 INFO device.py L287: torch.use_deterministic_algorithms(False) is set for XPU.[0m
[38;20m2026-05-21 09:57:08 INFO device.py L288: Patched torch SDPA on XPU to use is_causal=True for pure causal masks (avoids ~10x peak-VRAM blow-up from MATH backend).[0m
[33;1m2026-05-21 09:57:08 WARNING logging.py L328: reset enable_torch_compile to `False` as fp8 is enabled[0m
[38;20m2026-05-21 09:57:08 INFO base.py L565: Using predefined ignore_layers: model.layers.0.mlp.gate[0m
[38;20m2026-05-21 09:57:08 INFO base.py L565: Using predefined ignore_layers: model.layers.0.mlp.gate[0m
Traceback (most recent call last):
  File "/data/jenkins/816609/workspace/AutoRound_LLMC_example_test/llm-compressor/examples/autoround/quantization_w8a8_mxfp8/qwen3_example.py", line 37, in <module>
    oneshot(
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 412, in oneshot
    one_shot()
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 189, in __call__
    self.apply_recipe_modifiers(
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 241, in apply_recipe_modifiers
    pipeline(
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/independent/pipeline.py", line 45, in __call__
    pipeline(model, dataloader, dataset_args)
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 475, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/pipeline.py", line 154, in __call__
    LifecycleCallbacks.sequential_epoch_end(modules)
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session_functions.py", line 165, in sequential_epoch_end
    return cls.event(EventType.SEQUENTIAL_EPOCH_END, modules=modules, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session_functions.py", line 91, in event
    return active_session().event(event_type, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session.py", line 181, in event
    mod_data = self._lifecycle.event(
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/lifecycle.py", line 204, in event
    data = mod.update_event(state=self.state, event=event, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/modifier.py", line 122, in update_event
    self.on_event(state, event, **kwargs)
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/autoround/base.py", line 215, in on_event
    self.apply_autoround(state, modules)
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/autoround/base.py", line 297, in apply_autoround
    q_input, _ = ar.quantize_block(
                 ^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/compressors/data_driven.py", line 392, in quantize_block
    self.quantizer.quantize_block(
  File "/opt/venv/lib/python3.12/site-packages/auto_round/algorithms/quantization/sign_round/quantizer.py", line 266, in quantize_block
    output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/algorithms/quantization/base.py", line 544, in _get_current_q_output
    output_q = _bf(
               ^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/compressors/utils.py", line 182, in block_forward
    output = block(input_ids, *input_tuple, **input_others)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1885, in _call_impl
    return inner()
           ^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1833, in inner
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/qwen3_moe/modeling_qwen3_moe.py", line 359, in forward
    hidden_states = self.mlp(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modeling/qwen3_moe.py", line 84, in forward
    expert_out = expert_layer(hidden_states[top_x])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/qwen3_moe/modeling_qwen3_moe.py", line 209, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/wrapper.py", line 506, in forward
    weight_q, *_ = self._qdq_weight(self.value, self.min_scale, self.max_scale)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/wrapper.py", line 257, in _qdq_weight
    weight_q, scale, zp = self.weight_quant_func(
                          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/data_type/mxfp.py", line 176, in quant_mx
    tensor = quant_element(tensor, ebits, mbits, max_norm, mantissa_rounding)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/data_type/mxfp.py", line 81, in quant_element
    else tensor / (2.0 ** float(mbits - 2)) * (2.0 ** private_exp.float())
                                               ~~~~^^~~~~~~~~~~~~~~~~~~~~
  File "/opt/venv/lib/python3.12/site-packages/torch/_tensor.py", line 47, in wrapped
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_tensor.py", line 1155, in __rpow__
    return torch.pow(other, self)
           ^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
```

### Reproduction Steps

https://github.com/vllm-project/llm-compressor/blob/main/examples/autoround/quantization_w8a8_mxfp8/qwen3_example.py

### Environment Information

_No response_

### Error Logs

```shell

```

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Qwen3][MXFP8][LLMC]: RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST) #1835

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Qwen3][MXFP8][LLMC]: RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST) #1835

Description

Problem Description

Reproduction Steps

Environment Information

Error Logs

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions