`olive auto-opt` for CPU INT4 fails without `--use_model_builder` — `DynamicCache.from_legacy_cache` AttributeError with transformers 5.x

**Describe the bug**

Running `olive auto-opt` to optimize a model for CPU with INT4 precision fails when `--use_model_builder` is not specified. The default ONNX export path in `olive/passes/onnx/conversion.py` calls `DynamicCache.from_legacy_cache()`, which was removed in `transformers 5.x`, causing an `AttributeError`.

Adding `--use_model_builder` (and `--use_ort_genai`) bypasses this by using the `onnxruntime-genai` model builder instead of `torch.onnx.export`, and the optimization + inference completes successfully.

The `--use_model_builder` flag is documented as optional, but omitting it when targeting CPU with INT4 precision on `transformers 5.x` results in a crash. The official quickstart example in the README omits this flag, which may lead users to the same failure.

This was discovered while investigating a related issue where, on older package versions (`transformers 4.x`, `onnxruntime-genai 0.5.0`), the model builds successfully without `--use_model_builder` but fails at inference time with an `OrtException` related to `GatherBlockQuantized` and `uint8` tensors. On current package versions, the failure occurs earlier — at the conversion stage itself.

**To Reproduce**

1. Install current packages:
   - `olive-ai[all]` (0.11.0)
   - `onnxruntime-genai==0.11.4`
   - `transformers==5.1.0`
   - `torch==2.10.0`
   - Python 3.13

2. Run optimization **without** `--use_model_builder`:
```bash
python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path models/qwen3-cpu-int4 \
  --device cpu \
  --provider CPUExecutionProvider \
  --precision int4 \
  --log_level 1
```

3. Observe crash at the ONNX conversion stage.

**Expected behavior**

`olive auto-opt` should either:
1. Default to `--use_model_builder` when targeting CPU with INT4 precision, or
2. Be compatible with `transformers 5.x` on the standard ONNX export path, or
3. Surface a clear error message directing users to use `--use_model_builder`

**Olive config**

No JSON config — reproduced via CLI.

Working command:
```bash
python -m olive auto-opt \
  --model_name_or_path Qwen/Qwen2.5-0.5B-Instruct \
  --output_path models/qwen3-cpu-int4 \
  --device cpu \
  --provider CPUExecutionProvider \
  --use_model_builder \
  --use_ort_genai \
  --precision int4 \
  --log_level 1
```

**Olive logs**

```
Traceback (most recent call last):
  File "...\olive\engine\engine.py", line 732, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, output_model_path)
  File "...\olive\systems\local.py", line 52, in run_pass
    return the_pass.run(model_config, output_model_path)
  File "...\olive\passes\onnx\conversion.py", line 196, in run
    return self._run_for_config(model_config, config, output_model_path)
  File "...\olive\passes\onnx\conversion.py", line 390, in _run_for_config
    return OnnxConversion._convert_model_on_device(...)
  File "...\olive\passes\onnx\conversion.py", line 596, in _convert_model_on_device
    ir_model = _export_pytorch_model(...)
  File "...\torch\utils\_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "...\olive\passes\onnx\conversion.py", line 267, in _export_pytorch_model
    torch.onnx.export(...)
  File "...\torch\onnx\__init__.py", line 341, in export
    export(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 552, in export
    _export(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 1513, in _export
    graph, params_dict, torch_out = _model_to_graph(...)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 1112, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 996, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
  File "...\torch\onnx\_internal\torchscript_exporter\utils.py", line 903, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(...)
  File "...\torch\jit\_trace.py", line 1432, in _get_trace_graph
    outs = ONNXTracedModule(...)(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "...\torch\jit\_trace.py", line 140, in forward
    graph, _out = torch._C._create_graph_by_tracing(...)
  File "...\torch\jit\_trace.py", line 131, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "...\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "...\torch\nn\modules\module.py", line 1766, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "...\olive\passes\onnx\conversion.py", line 104, in patched_forward
    args[pkv_index] = DynamicCache.from_legacy_cache(args[pkv_index])
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: type object 'DynamicCache' has no attribute 'from_legacy_cache'
```

**Other information**

- OS: Windows 11
- Olive version: 0.11.0
- ONNXRuntime package and version: `onnxruntime-genai==0.11.4`
- Transformers package version: `transformers==5.1.0`
- Torch version: `2.10.0`
- Python version: 3.13

**Additional context**

- The root cause is in `olive/passes/onnx/conversion.py` line 104, which calls `DynamicCache.from_legacy_cache()` — a method that was removed in `transformers 5.x`.
- This likely affects all models optimized via `olive auto-opt` without `--use_model_builder` on `transformers 5.x`, not just Qwen2.5.
- The `olive-recipes` repo currently has no CPU recipe for Qwen2.5-0.5B-Instruct — all existing recipes target GPU/NPU runtimes. Happy to contribute a CPU recipe PR.
- Related: an earlier report of the same underlying issue (missing `--use_model_builder`) on older packages (`transformers 4.x`, `onnxruntime-genai 0.5.0`) manifested as a `GatherBlockQuantized` / `uint8` OrtException at inference time rather than at conversion time.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`olive auto-opt` for CPU INT4 fails without `--use_model_builder` — `DynamicCache.from_legacy_cache` AttributeError with transformers 5.x #2335

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

olive auto-opt for CPU INT4 fails without --use_model_builder — DynamicCache.from_legacy_cache AttributeError with transformers 5.x #2335

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`olive auto-opt` for CPU INT4 fails without `--use_model_builder` — `DynamicCache.from_legacy_cache` AttributeError with transformers 5.x #2335