feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export by justinchuby · Pull Request #2406 · microsoft/Olive

justinchuby · 2026-04-09T21:24:07Z

Summary

Adds a new Olive pass (MobiusModelBuilder) that wraps mobius build() to produce ONNX models from HuggingFace model IDs.

Single-component models (LLMs) → ONNXModelHandler
Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler
EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu)
Precision: fp32 (default), fp16, bf16
Registered in olive_config.json as MobiusModelBuilder
Example pipeline config: examples/gemma4/gemma4_int4_cuda.json
14 unit tests covering single/multi-component, EP detection, and error cases

Validated: Gemma4 INT4 Quantization Pipeline

Successfully tested MobiusModelBuilder → OnnxBlockWiseRtnQuantization on google/gemma-4-E2B-it:

Quantized ops per component

Component	Total nodes	MatMulNBits	GatherBlockQuantized	Other ops
decoder	1,277	277	39	961
audio	1,465	135	0	1,330
vision	1,488	114	66	1,308
embedding	24	0	1	23

Weight quantization coverage

Component	Quantized (UINT8/INT4)	Non-quantized (FP16)	% quantized by size
decoder	316 tensors (2.4G elements)	585 tensors (71M elements)	97%
audio	135 tensors (154M elements)	768 tensors (2.8M elements)	98%
embedding	1 tensor (201M elements)	4 tensors (3.1M elements)	98%
vision	147 tensors (90M elements)	698 tensors (1.7M elements)	98%

Output structure (2.8GB total, down from ~5GB fp16)

models/gemma4-e2b-int4-cuda/
├── decoder.onnx      (853K) + decoder.onnx.data   (2.4G)
├── audio.onnx        (1.2M) + audio.onnx.data     (152M)
├── embedding.onnx    (8.5K) + embedding.onnx.data  (199M)
├── vision.onnx       (1.2M) + vision.onnx.data     (89M)
└── model_config.json

Pipeline timing

Pass	Time
`MobiusModelBuilder` (fp16 build)	77s
`OnnxBlockWiseRtnQuantization` (int4)	129s
Total	~3.5 min

Adds a new Olive pass that wraps mobius's build() function to produce ONNX models directly from HuggingFace model IDs. - Single-component models (LLMs) → ONNXModelHandler - Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler - EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu) - Precision: fp32 (default), fp16, bf16 - Registered in olive_config.json as 'MobiusModelBuilder' - Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json - 10 unit tests covering single/multi-component, EP detection, and error cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- test_ep_map_covers_common_providers now asserts DML and WebGPU in addition to CPU and CUDA, verifying full EP coverage - Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Use official model IDs: - google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any (vision + audio + text) - google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text to Text only (no audio encoder) Updated both example configs to use google/gemma-4-E2B-it and added comment strings documenting the audio-capable vs image-only distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

…lds) Fix invalid RunConfig fields in both example configs: - Remove output_name and system (not valid engine fields) - Move target reference to engine.target - Use log_severity_level=1 Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct: - olive run completed successfully - model.onnx + model.onnx.data produced - ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Copilot

Pull request overview

Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.

Changes:

Introduces olive/passes/onnx/mobius_model_builder.py implementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough).
Registers the pass in olive/olive_config.json and adds two Gemma4 example run configs.
Adds unit tests for single-component, multi-component, EP selection, and error paths.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`olive/passes/onnx/mobius_model_builder.py`	New pass wrapping `mobius.build()` and emitting Olive model handlers.
`olive/olive_config.json`	Registers `MobiusModelBuilder` and declares extras for its dependencies.
`examples/gemma4/gemma4_int4_pipeline.json`	Example pipeline: mobius export (fp16 CUDA) then INT4 quantization.
`examples/gemma4/gemma4_fp32_cpu.json`	Example pipeline: mobius export (fp32 CPU).
`test/passes/onnx/test_mobius_model_builder.py`	New unit tests for config, handler types, EP mapping, and missing dependency behavior.

- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a downstream quantization pass for INT4/INT8 instead - Remove explicit execution_provider from CUDA example config so both gemma4 configs consistently rely on auto-detection from the accelerator spec; the CPU config already did this - olive_config.json: add mobius-genai to top-level extra_dependencies map so 'olive run' can surface the install hint; remove onnx_ir (transitive dep of mobius-genai) from the pass entry - Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) — safe because the file already has 'from __future__ import annotations' - Use X | Y union syntax instead of Union[X, Y] (RUFF UP007) - Remove redundant 'import onnx_ir' check; ImportError message now correctly says 'pip install mobius-genai' (PYLINT W0611) - Rename unused _fake_pkg 'output_dir' param to '_output_dir' to suppress lint warning (PYLINT W0613) - Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format) - Collapse nested 'with' into single 'with' (RUFF SIM117) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]] (keys are enum instances, not plain strings) - olive_config.json: add onnx-ir (correct pip hyphenated name) to both the pass extra_dependencies and the top-level extra_dependencies map; was previously using wrong underscore spelling 'onnx_ir' - Rename examples/gemma4/gemma4_int4_pipeline.json -> gemma4_int4_cuda.json so both example configs follow the same {precision}_{device}.json naming pattern - _patch_build: expand docstring explaining why 'mobius.build' is the correct patch target (lazy import inside function body, not module-level) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

…delBuilder - After pkg.save(), verify each expected model.onnx exists and raise RuntimeError with a clear message if missing (single-component and per-component in multi-component paths) - Log a WARNING when trust_remote_code=True is passed so users are reminded to only use this with trusted model sources - Add 4 new tests: missing output raises RuntimeError (single and multi-component), trust_remote_code warning emitted, no warning when False (14/14 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

- Add module-scoped _stub_mobius_module fixture that injects a fake 'mobius' stub into sys.modules when the package is not installed, ensuring patch('mobius.build') works in Olive CI without mobius-genai - Add '# pylint: disable=protected-access' on _default_config test line (PYLINT W0212 — intentional test access to a pass internals method) - Add '# noqa: PLC0415' on lazy 'from mobius import build' inside _run_for_config — import is intentionally deferred to surface a clear ImportError only when the pass actually runs - Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches on mobius_model_builder.py, test file, and both example configs - 14/14 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Change all references from 'mobius-genai' to 'mobius-ai': - olive_config.json: extra_dependencies key/value and top-level mapping - mobius_model_builder.py: docstring install snippet and ImportError message - test file: fixture docstring comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files. The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not enable PLC0415 in this repo, so the directive was unused. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

jambayk · 2026-04-10T18:26:48Z

+                ),
+            ),
+            "execution_provider": PassConfigParam(
+                type_=str,


we could create an enum of the supported eps for automatic validation like in

Olive/olive/passes/pytorch/autoawq.py

Line 27 in 8b1957e

class ModelDtype(StrEnumBase):

.
unless you think the options might keep growing and it would be hard to keep it in sync across versions

Replace GptqQuantizer (requires auto_gptq) with the built-in OnnxBlockWiseRtnQuantization pass which works out of the box. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Replace free-form string with StrEnumBase enum matching the pattern from AutoAWQQuantizer.ModelDtype. Supports: default, cpu, cuda, dml, webgpu, trt-rtx, onnx-standard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

Configs moved to microsoft/olive-recipes per repo convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

xiaoyu-work · 2026-04-24T04:34:24Z

will mobius generate genai_config.json and related files for ort genai? Also, does mobius support customized naming for different component for multi components model? I can see all component models are named as "model.onnx"

Add 'runtime' config param (default: ort-genai) that generates genai_config.json, tokenizer files, and processor configs alongside ONNX models via write_ort_genai_config(). Set to 'none' to skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby · 2026-04-25T02:01:13Z

@xiaoyu-work The models are <component_name>/model.onnx. Is it good enough or are there cases when we want a flat layout / renamed models?

devang-ml · 2026-04-27T17:54:16Z

+
+        pip install mobius-ai
+
+    See https://github.com/microsoft/mobius


Incorrect URL

devang-ml · 2026-04-27T17:56:56Z

+                ),
+            ),
+            "execution_provider": PassConfigParam(
+                type_=MobiusModelBuilder.MobiusEP,


We should not have pass level execution provider user choices.
The Olive engine is running a series of passes and user selects the EP for a given engine run.

See another relevant comment below _run_for_config

The execution_provider pass config param has been removed. The pass now uses only the Olive accelerator spec EP, and raises a ValueError if that EP is not supported by mobius. This was addressed in 8efcec5.

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

titaiwangms

Nice work! A few suggestions from our experience building the Ministral-3 VLM recipe on top of mobius:

1. Per-component VLM quantization guidance
For VLMs, downstream quantization needs differ per component: decoder → INT4, vision → INT8, embedding → none (ORT has a Gather rank-1 bug with GatherBlockQuantized that prevents embedding quantization). It would be helpful to document how recipe authors should wire per-component quantization after CompositeModelHandler. Even a brief note in the docstring would save people time.

2. context_length default may be too small
write_ort_genai_config() defaults context_length to 4096, but models like Ministral-3 need 262144 for full YaRN RoPE cache. Consider reading max_position_embeddings from the HF config automatically, or exposing it as a pass config parameter.

3. Minimum mobius version check (minor)
If a user has an older mobius without support for newer model types (e.g., Pixtral), they would get confusing errors from mobius.build(). A version check or clearer error message would help.

4. Migration example (nice-to-have)
Existing recipes manually call mobius.build() + write_ort_genai_config(). A brief before/after example showing how to replace that with MobiusModelBuilder pass would help adoption.

Overall this is clean and well-tested — excited to migrate our recipe to use this once it lands!

justinchuby · 2026-05-04T18:51:34Z

@xiaoyu-work how should we handle per component quantization with Olive?

justinchuby · 2026-05-05T00:22:45Z

@xiaoyu-work Thanks for the question!

genai_config.json generation: Yes, we've just added support for this. The pass now calls `_write_genai_config()` which generates all ORT GenAI artifacts (genai_config.json, tokenizer files, processor configs) and includes them in the model handler's `additional_files`. This happens automatically when `runtime='ort-genai'` (the default).

Multi-component naming: Correct, mobius uses the flat `<component_name>/model.onnx` layout (e.g., `decoder/model.onnx`, `vision_encoder/model.onnx`). This is what ORT GenAI expects. The pass returns a `CompositeModelHandler` with each component's directory as the model path, so the structure is preserved.

justinchuby · 2026-05-05T00:22:55Z

@titaiwangms Great feedback! Addressing each point:

1. Per-component VLM quantization guidance
This is important. The `CompositeModelHandler` structure makes this straightforward — each component is a standalone `ONNXModelHandler` with its own model path, so recipe authors can wire per-component quantization passes. We'll add a docstring note with a brief example (e.g., decoder → INT4, vision → INT8, embedding → skip).

2. context_length / max_position_embeddings
Good catch! We've verified that `write_ort_genai_config()` already handles this automatically — it reads `max_position_embeddings` from the HF config and takes `max(context_length, max_position_embeddings)`. So Ministral-3 and similar models with large rope should already work without manual intervention. No config parameter needed!

3. Minimum mobius version check
Valid concern. For now, users hitting old mobius versions will see mobius's own errors. We can add a version check in a follow-up if needed.

4. Migration example
Will add a docstring example showing the before/after. Quick version:

```

Before: manual mobius.build()

from mobius import build
pkg = build('model-id', load_weights=True)
pkg.save('output')
write_ort_genai_config(pkg, 'output', hf_model_id='model-id')

After: MobiusModelBuilder pass

Just put {'type': 'MobiusModelBuilder', 'precision': 'fp32'} in your Olive config!

```

Thanks for the thorough review — these notes will help adoption!

justinchuby · 2026-05-05T00:23:04Z

@xiaoyu-work @justinchuby Regarding per-component quantization:

With `CompositeModelHandler`, you can wire different passes per component by targeting them in the pipeline config. Here's the pattern:

```json
{
"passes": {
"mobius": {"type": "MobiusModelBuilder", "precision": "fp16"},
"quantize_decoder": {
"type": "OnnxBlockWiseRtnQuantization",
"bits": 4,
"nodes_to_exclude": [...]
}
}
}
```

Then the Olive engine will:

Run MobiusModelBuilder → CompositeModelHandler with components [decoder, vision, embedding, audio]
Run quantize_decoder on each component separately

For selective per-component quantization (e.g., skip embedding), you'd use `nodes_to_exclude` or create separate configs per component. We can document this pattern in the pass docstring to make it clearer.

The architecture here is that each pass is component-agnostic; the engine applies it to all components in the composite. For more granular control, recipes can create separate Olive config variants (one per component, or one per quantization strategy).

Justin Chu and others added 4 commits April 9, 2026 14:04

Copilot AI review requested due to automatic review settings April 9, 2026 21:24

Copilot started reviewing on behalf of justinchuby April 9, 2026 21:25 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

github-advanced-security AI found potential problems Apr 9, 2026

View reviewed changes

justinchuby marked this pull request as draft April 9, 2026 22:00

Justin Chu and others added 4 commits April 9, 2026 19:56

docs: clarify _patch_build comment on lazy import patch target

8c1259c

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread examples/gemma4/gemma4_int4_cuda.json Fixed

justinchuby self-assigned this Apr 10, 2026

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

github-advanced-security AI found potential problems Apr 10, 2026

View reviewed changes

Comment thread olive/passes/onnx/mobius_model_builder.py Fixed

justinchuby requested a review from Copilot April 10, 2026 17:36

Copilot started reviewing on behalf of justinchuby April 10, 2026 17:37 View session

Copilot AI reviewed Apr 10, 2026

View reviewed changes

Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated

Comment thread examples/gemma4/gemma4_int4_cuda.json Outdated

Comment thread olive/passes/onnx/mobius_model_builder.py Outdated

Comment thread olive/passes/onnx/mobius_model_builder.py

justinchuby requested review from jambayk and xiaoyu-work April 10, 2026 17:54

jambayk reviewed Apr 10, 2026

View reviewed changes

Comment thread examples/gemma4/gemma4_fp32_cpu.json Outdated

jambayk reviewed Apr 10, 2026

View reviewed changes

Comment thread olive/passes/onnx/mobius_model_builder.py Outdated

jambayk reviewed Apr 10, 2026

View reviewed changes

Comment thread olive/passes/onnx/mobius_model_builder.py

Copilot started work on behalf of justinchuby April 10, 2026 19:06 View session

justinchuby and others added 2 commits April 23, 2026 23:18

justinchuby force-pushed the justinchu/mobius-model-builder branch from 21ab3e2 to 2af889f Compare April 23, 2026 23:39

justinchuby and others added 2 commits April 23, 2026 23:41

Merge origin/main

f1c0a1a

chore: move gemma4 example configs to olive-recipes

16f74dd

Configs moved to microsoft/olive-recipes per repo convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>

justinchuby mentioned this pull request Apr 23, 2026

Add Gemma 4 E2B recipes (MobiusModelBuilder + INT4) microsoft/olive-recipes#381

Open

justinchuby marked this pull request as ready for review April 23, 2026 23:46

justinchuby force-pushed the justinchu/mobius-model-builder branch from dcce655 to 68ed349 Compare April 24, 2026 04:41

github-advanced-security AI found potential problems Apr 24, 2026

View reviewed changes

Comment thread test/passes/onnx/test_mobius_model_builder.py Fixed

devang-ml reviewed Apr 27, 2026

View reviewed changes

Address comments

8efcec5

Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

This comment was marked as resolved.

Sign in to view

Copilot started work on behalf of justinchuby May 4, 2026 14:26 View session

This comment was marked as resolved.

Sign in to view

Copilot finished work on behalf of justinchuby May 4, 2026 14:32

justinchuby force-pushed the justinchu/mobius-model-builder branch from a834c8a to 8efcec5 Compare May 4, 2026 14:34

microsoft deleted a comment from Copilot AI May 4, 2026

justinchuby added 3 commits May 4, 2026 08:20

test: stabilize mobius model builder CI cases

9d519e4

fix: fallback to mobius default EP for unsupported providers

797f466

test: add fallback EP test for None execution_provider

3cccb83

justinchuby mentioned this pull request May 4, 2026

Olive feedback: excellent quantization experience with Gemma4 models #2440

Open

lint

d66515d

Co-authored-by: Copilot <copilot@github.com> Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>

titaiwangms reviewed May 4, 2026

View reviewed changes


		pip install mobius-ai

		See https://github.com/microsoft/mobius

Conversation

justinchuby commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validated: Gemma4 INT4 Quantization Pipeline

Quantized ops per component

Weight quantization coverage

Output structure (2.8GB total, down from ~5GB fp16)

Pipeline timing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jambayk Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

justinchuby Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

xiaoyu-work commented Apr 24, 2026

Uh oh!

Uh oh!

justinchuby commented Apr 25, 2026

Uh oh!

devang-ml Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

justinchuby May 4, 2026

Choose a reason for hiding this comment

Uh oh!

devang-ml Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

devang-ml Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI May 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment was marked as resolved.

This comment was marked as resolved.

titaiwangms left a comment

Choose a reason for hiding this comment

Uh oh!

justinchuby commented May 4, 2026

Uh oh!

justinchuby commented May 5, 2026

justinchuby commented Apr 9, 2026 •

edited

Loading