Skip to content

olive auto-opt for CPU INT4 fails without --use_model_builderDynamicCache.from_legacy_cache AttributeError with transformers 5.x #2335

@HamidOna

Description

@HamidOna

Describe the bug

Running olive auto-opt to optimize a model for CPU with INT4 precision fails when --use_model_builder is not specified. The default ONNX export path in olive/passes/onnx/conversion.py calls DynamicCache.from_legacy_cache(), which was removed in transformers 5.x, causing an AttributeError.

Adding --use_model_builder (and --use_ort_genai) bypasses this by using the onnxruntime-genai model builder instead of torch.onnx.export, and the optimization + inference completes successfully.

The --use_model_builder flag is documented as optional, but omitting it when targeting CPU with INT4 precision on transformers 5.x results in a crash. The official quickstart example in the README omits this flag, which may lead users to the same failure.

This was discovered while investigating a related issue where, on older package versions (transformers 4.x, onnxruntime-genai 0.5.0), the model builds successfully without --use_model_builder but fails at inference time with an OrtException related to GatherBlockQuantized and uint8 tensors. On current package versions, the failure occurs earlier — at the conversion stage itself.

To Reproduce

  1. Install current packages:

    • olive-ai[all] (0.11.0)
    • onnxruntime-genai==0.11.4
    • transformers==5.1.0
    • torch==2.10.0
    • Python 3.13
  2. Run optimization without --use_model_builder:

python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path models/qwen3-cpu-int4 \
  --device cpu \
  --provider CPUExecutionProvider \
  --precision int4 \
  --log_level 1
  1. Observe crash at the ONNX conversion stage.

Expected behavior

olive auto-opt should either:

  1. Default to --use_model_builder when targeting CPU with INT4 precision, or
  2. Be compatible with transformers 5.x on the standard ONNX export path, or
  3. Surface a clear error message directing users to use --use_model_builder

Olive config

No JSON config — reproduced via CLI.

Working command:

python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path models/qwen3-cpu-int4 \
  --device cpu \
  --provider CPUExecutionProvider \
  --use_model_builder \
  --use_ort_genai \
  --precision int4 \
  --log_level 1

Olive logs

Traceback (most recent call last):
  File "...\olive\engine\engine.py", line 732, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, output_model_path)
  File "...\olive\systems\local.py", line 52, in run_pass
    return the_pass.run(model_config, output_model_path)
  File "...\olive\passes\onnx\conversion.py", line 196, in run
    return self._run_for_config(model_config, config, output_model_path)
  File "...\olive\passes\onnx\conversion.py", line 390, in _run_for_config
    return OnnxConversion._convert_model_on_device(...)
  File "...\olive\passes\onnx\conversion.py", line 596, in _convert_model_on_device
    ir_model = _export_pytorch_model(...)
  File "...\torch\utils\_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "...\olive\passes\onnx\conversion.py", line 267, in _export_pytorch_model
    torch.onnx.export(...)
  File "...\torch\onnx\__init__.py", line 341, in export
    export(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 552, in export
    _export(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 1513, in _export
    graph, params_dict, torch_out = _model_to_graph(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 1112, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 996, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 903, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(...)
  File "...\torch\jit\_trace.py", line 1432, in _get_trace_graph
    outs = ONNXTracedModule(...)(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "...\torch\jit\_trace.py", line 140, in forward
    graph, _out = torch._C._create_graph_by_tracing(...)
  File "...\torch\jit\_trace.py", line 131, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "...\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1766, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "...\olive\passes\onnx\conversion.py", line 104, in patched_forward
    args[pkv_index] = DynamicCache.from_legacy_cache(args[pkv_index])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'DynamicCache' has no attribute 'from_legacy_cache'

Other information

  • OS: Windows 11
  • Olive version: 0.11.0
  • ONNXRuntime package and version: onnxruntime-genai==0.11.4
  • Transformers package version: transformers==5.1.0
  • Torch version: 2.10.0
  • Python version: 3.13

Additional context

  • The root cause is in olive/passes/onnx/conversion.py line 104, which calls DynamicCache.from_legacy_cache() — a method that was removed in transformers 5.x.
  • This likely affects all models optimized via olive auto-opt without --use_model_builder on transformers 5.x, not just Qwen2.5.
  • The olive-recipes repo currently has no CPU recipe for Qwen2.5-0.5B-Instruct — all existing recipes target GPU/NPU runtimes. Happy to contribute a CPU recipe PR.
  • Related: an earlier report of the same underlying issue (missing --use_model_builder) on older packages (transformers 4.x, onnxruntime-genai 0.5.0) manifested as a GatherBlockQuantized / uint8 OrtException at inference time rather than at conversion time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions