Loading checkpoint shards: 0%| | 0/16 [00:00<?, ?it/s]�[A
Loading checkpoint shards: 100%|██████████| 16/16 [00:01<00:00, 15.41it/s]
�[38;20m2026-05-21 09:56:26 INFO calib_dataset.py L977: Preprocessing calibration dataset in a subprocess to avoid memory leaks...�[0m
Map: 0%| | 0/10000 [00:00<?, ? examples/s]�[A
Map: 100%|██████████| 10000/10000 [00:09<00:00, 1036.44 examples/s]
Filter: 0%| | 0/10000 [00:00<?, ? examples/s]�[A
Filter: 100%|██████████| 10000/10000 [00:03<00:00, 3028.36 examples/s]
Casting the dataset: 0%| | 0/2301 [00:00<?, ? examples/s]�[A
Casting the dataset: 100%|██████████| 2301/2301 [00:03<00:00, 649.01 examples/s]
2026-05-21T09:56:46.2089 | __init__ | WARNING - Disabling tokenizer parallelism due to threading conflict between FastTokenizer and Datasets. Set TOKENIZERS_PARALLELISM=false to suppress this warning.
2026-05-21T09:56:47.5298 | reset | INFO - Compression lifecycle reset
2026-05-21T09:56:47.6216 | moe_calibration_context | INFO - Found 48 MoE modules to replace
Replacing MoE modules for calibration: 0%| | 0/48 [00:00<?, ?it/s]�[A
Replacing MoE modules for calibration: 100%|██████████| 48/48 [00:00<00:00, 2018.92it/s]
2026-05-21T09:56:47.6464 | moe_calibration_context | INFO - Replaced 48 MoE modules for calibration
2026-05-21T09:56:47.6465 | moe_calibration_context | INFO - 48/48 modules will be restored after calibration
2026-05-21T09:56:47.6473 | from_modifiers | INFO - Creating recipe from modifiers
2026-05-21T09:56:52.1717 | initialize | INFO - Compression lifecycle initialized for 1 modifiers
2026-05-21T09:56:52.1720 | IndependentPipeline | INFO - Inferred `SequentialPipeline` for `AutoRoundModifier`
W0521 09:56:53.964000 5819 torch/fx/_symbolic_trace.py:53] is_fx_tracing will return true for both fx.symbolic_trace and torch.export. Please use is_fx_tracing_symbolic_tracing() for specifically fx.symbolic_trace or torch.compiler.is_compiling() for specifically torch.export/compile.
Preparing cache: 0%| | 0/128 [00:00<?, ?it/s]�[A
Preparing cache: 100%|██████████| 128/128 [00:00<00:00, 3418.69it/s]
(1/49): Calibrating: 0%| | 0/128 [00:00<?, ?it/s]�[A
(1/49): Calibrating: 100%|██████████| 128/128 [00:00<00:00, 199.70it/s]
(1/49): Propagating: 0%| | 0/128 [00:00<?, ?it/s]�[A
(1/49): Propagating: 100%|██████████| 128/128 [00:00<00:00, 193.08it/s]
(2/49): Calibrating: 0%| | 0/128 [00:00<?, ?it/s]�[A
(2/49): Calibrating: 100%|██████████| 128/128 [00:11<00:00, 10.86it/s]
2026-05-21T09:57:07.7937 | apply_autoround | INFO - Applying AutoRound on layer model.layers.0
�[33;1m2026-05-21 09:57:08 WARNING logging.py L328: Using LLM mode (new architecture).�[0m
�[38;20m2026-05-21 09:57:08 INFO device.py L287: torch.use_deterministic_algorithms(False) is set for XPU.�[0m
�[38;20m2026-05-21 09:57:08 INFO device.py L288: Patched torch SDPA on XPU to use is_causal=True for pure causal masks (avoids ~10x peak-VRAM blow-up from MATH backend).�[0m
�[33;1m2026-05-21 09:57:08 WARNING logging.py L328: reset enable_torch_compile to `False` as fp8 is enabled�[0m
�[38;20m2026-05-21 09:57:08 INFO base.py L565: Using predefined ignore_layers: model.layers.0.mlp.gate�[0m
�[38;20m2026-05-21 09:57:08 INFO base.py L565: Using predefined ignore_layers: model.layers.0.mlp.gate�[0m
Traceback (most recent call last):
File "/data/jenkins/816609/workspace/AutoRound_LLMC_example_test/llm-compressor/examples/autoround/quantization_w8a8_mxfp8/qwen3_example.py", line 37, in <module>
oneshot(
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 412, in oneshot
one_shot()
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 189, in __call__
self.apply_recipe_modifiers(
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/entrypoints/oneshot.py", line 241, in apply_recipe_modifiers
pipeline(
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/independent/pipeline.py", line 45, in __call__
pipeline(model, dataloader, dataset_args)
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/helpers.py", line 475, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/pipelines/sequential/pipeline.py", line 154, in __call__
LifecycleCallbacks.sequential_epoch_end(modules)
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session_functions.py", line 165, in sequential_epoch_end
return cls.event(EventType.SEQUENTIAL_EPOCH_END, modules=modules, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session_functions.py", line 91, in event
return active_session().event(event_type, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/session.py", line 181, in event
mod_data = self._lifecycle.event(
^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/core/lifecycle.py", line 204, in event
data = mod.update_event(state=self.state, event=event, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/modifier.py", line 122, in update_event
self.on_event(state, event, **kwargs)
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/autoround/base.py", line 215, in on_event
self.apply_autoround(state, modules)
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modifiers/autoround/base.py", line 297, in apply_autoround
q_input, _ = ar.quantize_block(
^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/auto_round/compressors/data_driven.py", line 392, in quantize_block
self.quantizer.quantize_block(
File "/opt/venv/lib/python3.12/site-packages/auto_round/algorithms/quantization/sign_round/quantizer.py", line 266, in quantize_block
output_q = self._get_current_q_output(block, input_ids, input_others, indices, device, loss_device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/auto_round/algorithms/quantization/base.py", line 544, in _get_current_q_output
output_q = _bf(
^^^^
File "/opt/venv/lib/python3.12/site-packages/auto_round/compressors/utils.py", line 182, in block_forward
output = block(input_ids, *input_tuple, **input_others)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__
return super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1885, in _call_impl
return inner()
^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1833, in inner
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/models/qwen3_moe/modeling_qwen3_moe.py", line 359, in forward
hidden_states = self.mlp(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/llmcompressor/modeling/qwen3_moe.py", line 84, in forward
expert_out = expert_layer(hidden_states[top_x])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/models/qwen3_moe/modeling_qwen3_moe.py", line 209, in forward
down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/auto_round/wrapper.py", line 506, in forward
weight_q, *_ = self._qdq_weight(self.value, self.min_scale, self.max_scale)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/auto_round/wrapper.py", line 257, in _qdq_weight
weight_q, scale, zp = self.weight_quant_func(
^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/auto_round/data_type/mxfp.py", line 176, in quant_mx
tensor = quant_element(tensor, ebits, mbits, max_norm, mantissa_rounding)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/auto_round/data_type/mxfp.py", line 81, in quant_element
else tensor / (2.0 ** float(mbits - 2)) * (2.0 ** private_exp.float())
~~~~^^~~~~~~~~~~~~~~~~~~~~
File "/opt/venv/lib/python3.12/site-packages/torch/_tensor.py", line 47, in wrapped
return f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/_tensor.py", line 1155, in __rpow__
return torch.pow(other, self)
^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: level_zero backend failed with error: 20 (UR_RESULT_ERROR_DEVICE_LOST)
Problem Description
Reproduction Steps
https://github.com/vllm-project/llm-compressor/blob/main/examples/autoround/quantization_w8a8_mxfp8/qwen3_example.py
Environment Information
No response
Error Logs
Additional Context
No response