Skip to content

[Qwen3][MXFP8][LLMC]: RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST) #1835

@XuehaoSun

Description

@XuehaoSun

Problem Description

Loading checkpoint shards:   0%|          | 0/16 [00:00<?, ?it/s]�[A
Loading checkpoint shards: 100%|██████████| 16/16 [00:01<00:00, 15.41it/s]
�[38;20m2026-05-21 09:56:26 INFO calib_dataset.py L977: Preprocessing calibration dataset in a subprocess to avoid memory leaks...�[0m


Map:   0%|          | 0/10000 [00:00<?, ? examples/s]�[A
Map: 100%|██████████| 10000/10000 [00:09<00:00, 1036.44 examples/s]


Filter:   0%|          | 0/10000 [00:00<?, ? examples/s]�[A
Filter: 100%|██████████| 10000/10000 [00:03<00:00, 3028.36 examples/s]


Casting the dataset:   0%|          | 0/2301 [00:00<?, ? examples/s]�[A
Casting the dataset: 100%|██████████| 2301/2301 [00:03<00:00, 649.01 examples/s]
2026-05-21T09:56:46.2089 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
2026-05-21T09:56:47.5298 | reset | INFO - Compression lifecycle reset
2026-05-21T09:56:47.6216 | moe_calibration_context | INFO - Found 48 MoE modules to replace


Replacing MoE modules for calibration:   0%|          | 0/48 [00:00<?, ?it/s]�[A
Replacing MoE modules for calibration: 100%|██████████| 48/48 [00:00<00:00, 2018.92it/s]
2026-05-21T09:56:47.6464 | moe_calibration_context | INFO - Replaced 48 MoE modules for calibration
2026-05-21T09:56:47.6465 | moe_calibration_context | INFO - 48/48 modules will be restored after calibration
2026-05-21T09:56:47.6473 | from_modifiers | INFO - Creating recipe from modifiers
2026-05-21T09:56:52.1717 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
2026-05-21T09:56:52.1720 | IndependentPipeline | INFO - Inferred `SequentialPipeline` for `AutoRoundModifier`
W0521 09:56:53.964000 5819 torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile.


Preparing cache:   0%|          | 0/128 [00:00<?, ?it/s]�[A
Preparing cache: 100%|██████████| 128/128 [00:00<00:00, 3418.69it/s]


(1/49): Calibrating:   0%|          | 0/128 [00:00<?, ?it/s]�[A
(1/49): Calibrating: 100%|██████████| 128/128 [00:00<00:00, 199.70it/s]


(1/49): Propagating:   0%|          | 0/128 [00:00<?, ?it/s]�[A
(1/49): Propagating: 100%|██████████| 128/128 [00:00<00:00, 193.08it/s]


(2/49): Calibrating:   0%|          | 0/128 [00:00<?, ?it/s]�[A
(2/49): Calibrating: 100%|██████████| 128/128 [00:11<00:00, 10.86it/s]
2026-05-21T09:57:07.7937 | apply_autoround | INFO - Applying AutoRound on layer model.layers.0
�[33;1m2026-05-21 09:57:08 WARNING logging.py L328: Using LLM mode (new architecture).�[0m
�[38;20m2026-05-21 09:57:08 INFO device.py L287: torch.use_deterministic_algorithms(False) is set for XPU.�[0m
�[38;20m2026-05-21 09:57:08 INFO device.py L288: Patched torch SDPA on XPU to use is_causal=True for pure causal masks (avoids ~10x peak-VRAM blow-up from MATH backend).�[0m
�[33;1m2026-05-21 09:57:08 WARNING logging.py L328: reset enable_torch_compile to `False` as fp8 is enabled�[0m
�[38;20m2026-05-21 09:57:08 INFO base.py L565: Using predefined ignore_layers: model.layers.0.mlp.gate�[0m
�[38;20m2026-05-21 09:57:08 INFO base.py L565: Using predefined ignore_layers: model.layers.0.mlp.gate�[0m
Traceback (most recent call last):
  File "/data/jenkins/816609/workspace/AutoRound_LLMC_example_test/llm-compressor/examples/autoround/quantization_w8a8_mxfp8/qwen3_example.py", line 37, in <module>
    oneshot(
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 412, in oneshot
    one_shot()
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 189, in __call__
    self.apply_recipe_modifiers(
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 241, in apply_recipe_modifiers
    pipeline(
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/independent/pipeline.py", line 45, in __call__
    pipeline(model, dataloader, dataset_args)
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 475, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/pipeline.py", line 154, in __call__
    LifecycleCallbacks.sequential_epoch_end(modules)
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session_functions.py", line 165, in sequential_epoch_end
    return cls.event(EventType.SEQUENTIAL_EPOCH_END, modules=modules, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session_functions.py", line 91, in event
    return active_session().event(event_type, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session.py", line 181, in event
    mod_data = self._lifecycle.event(
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/lifecycle.py", line 204, in event
    data = mod.update_event(state=self.state, event=event, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/modifier.py", line 122, in update_event
    self.on_event(state, event, **kwargs)
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/autoround/base.py", line 215, in on_event
    self.apply_autoround(state, modules)
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/autoround/base.py", line 297, in apply_autoround
    q_input, _ = ar.quantize_block(
                 ^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/compressors/data_driven.py", line 392, in quantize_block
    self.quantizer.quantize_block(
  File "/opt/venv/lib/python3.12/site-packages/auto_round/algorithms/quantization/sign_round/quantizer.py", line 266, in quantize_block
    output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/algorithms/quantization/base.py", line 544, in _get_current_q_output
    output_q = _bf(
               ^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/compressors/utils.py", line 182, in block_forward
    output = block(input_ids, *input_tuple, **input_others)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1885, in _call_impl
    return inner()
           ^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1833, in inner
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/qwen3_moe/modeling_qwen3_moe.py", line 359, in forward
    hidden_states = self.mlp(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modeling/qwen3_moe.py", line 84, in forward
    expert_out = expert_layer(hidden_states[top_x])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/models/qwen3_moe/modeling_qwen3_moe.py", line 209, in forward
    down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/wrapper.py", line 506, in forward
    weight_q, *_ = self._qdq_weight(self.value, self.min_scale, self.max_scale)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/wrapper.py", line 257, in _qdq_weight
    weight_q, scale, zp = self.weight_quant_func(
                          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/data_type/mxfp.py", line 176, in quant_mx
    tensor = quant_element(tensor, ebits, mbits, max_norm, mantissa_rounding)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/auto_round/data_type/mxfp.py", line 81, in quant_element
    else tensor / (2.0 ** float(mbits - 2)) * (2.0 ** private_exp.float())
                                               ~~~~^^~~~~~~~~~~~~~~~~~~~~
  File "/opt/venv/lib/python3.12/site-packages/torch/_tensor.py", line 47, in wrapped
    return f(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/_tensor.py", line 1155, in __rpow__
    return torch.pow(other, self)
           ^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)

Reproduction Steps

https://github.com/vllm-project/llm-compressor/blob/main/examples/autoround/quantization_w8a8_mxfp8/qwen3_example.py

Environment Information

No response

Error Logs

Additional Context

No response

Metadata

Metadata

Assignees

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions