feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406
feat: add MobiusModelBuilder Olive pass for mobius-backed ONNX export#2406justinchuby wants to merge 24 commits intomainfrom
Conversation
Adds a new Olive pass that wraps mobius's build() function to produce ONNX models directly from HuggingFace model IDs. - Single-component models (LLMs) → ONNXModelHandler - Multi-component models (VLMs, encoder-decoders) → CompositeModelHandler - EP auto-detected from Olive accelerator spec (cpu/cuda/dml/webgpu) - Precision: fp32 (default), fp16, bf16 - Registered in olive_config.json as 'MobiusModelBuilder' - Example pipeline config: examples/gemma4/gemma4_int4_pipeline.json - 10 unit tests covering single/multi-component, EP detection, and error cases Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- test_ep_map_covers_common_providers now asserts DML and WebGPU in addition to CPU and CUDA, verifying full EP coverage - Add examples/gemma4/gemma4_fp32_cpu.json showing CPU/fp32 deployment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Use official model IDs: - google/gemma-4-E2B-it and google/gemma-4-E4B-it: Any-to-Any (vision + audio + text) - google/gemma-4-26B-A4B-it and google/gemma-4-31B-it: Image-Text to Text only (no audio encoder) Updated both example configs to use google/gemma-4-E2B-it and added comment strings documenting the audio-capable vs image-only distinction. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…lds) Fix invalid RunConfig fields in both example configs: - Remove output_name and system (not valid engine fields) - Move target reference to engine.target - Use log_severity_level=1 Verified E2E with HuggingFaceTB/SmolLM2-135M-Instruct: - olive run completed successfully - model.onnx + model.onnx.data produced - ORT loaded the model, correct causal-LM I/O (input_ids -> logits + KV cache) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new ONNX pass (MobiusModelBuilder) that uses the mobius package to build ONNX models directly from HuggingFace model IDs, returning either a single ONNXModelHandler or a CompositeModelHandler for multi-component exports.
Changes:
- Introduces
olive/passes/onnx/mobius_model_builder.pyimplementing the new pass (EP mapping, precision mapping, trust_remote_code passthrough). - Registers the pass in
olive/olive_config.jsonand adds two Gemma4 example run configs. - Adds unit tests for single-component, multi-component, EP selection, and error paths.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
olive/passes/onnx/mobius_model_builder.py |
New pass wrapping mobius.build() and emitting Olive model handlers. |
olive/olive_config.json |
Registers MobiusModelBuilder and declares extras for its dependencies. |
examples/gemma4/gemma4_int4_pipeline.json |
Example pipeline: mobius export (fp16 CUDA) then INT4 quantization. |
examples/gemma4/gemma4_fp32_cpu.json |
Example pipeline: mobius export (fp32 CPU). |
test/passes/onnx/test_mobius_model_builder.py |
New unit tests for config, handler types, EP mapping, and missing dependency behavior. |
- _PRECISION_TO_DTYPE: add inline comments explaining each dtype string (f32 = float32, f16 = float16, bf16 = bfloat16) and when to use a downstream quantization pass for INT4/INT8 instead - Remove explicit execution_provider from CUDA example config so both gemma4 configs consistently rely on auto-detection from the accelerator spec; the CPU config already did this - olive_config.json: add mobius-genai to top-level extra_dependencies map so 'olive run' can surface the install hint; remove onnx_ir (transitive dep of mobius-genai) from the pass entry - Move AcceleratorSpec import to TYPE_CHECKING block (RUFF TC001) — safe because the file already has 'from __future__ import annotations' - Use X | Y union syntax instead of Union[X, Y] (RUFF UP007) - Remove redundant 'import onnx_ir' check; ImportError message now correctly says 'pip install mobius-genai' (PYLINT W0611) - Rename unused _fake_pkg 'output_dir' param to '_output_dir' to suppress lint warning (PYLINT W0613) - Wrap long AcceleratorSpec(…) lines to stay under 120 chars (RUFF format) - Collapse nested 'with' into single 'with' (RUFF SIM117) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- EP_MAP: tighten annotation to ClassVar[dict[ExecutionProvider, str]]
(keys are enum instances, not plain strings)
- olive_config.json: add onnx-ir (correct pip hyphenated name) to both
the pass extra_dependencies and the top-level extra_dependencies map;
was previously using wrong underscore spelling 'onnx_ir'
- Rename examples/gemma4/gemma4_int4_pipeline.json ->
gemma4_int4_cuda.json so both example configs follow the same
{precision}_{device}.json naming pattern
- _patch_build: expand docstring explaining why 'mobius.build' is the
correct patch target (lazy import inside function body, not module-level)
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
…delBuilder - After pkg.save(), verify each expected model.onnx exists and raise RuntimeError with a clear message if missing (single-component and per-component in multi-component paths) - Log a WARNING when trust_remote_code=True is passed so users are reminded to only use this with trusted model sources - Add 4 new tests: missing output raises RuntimeError (single and multi-component), trust_remote_code warning emitted, no warning when False (14/14 passing) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
- Add module-scoped _stub_mobius_module fixture that injects a fake
'mobius' stub into sys.modules when the package is not installed,
ensuring patch('mobius.build') works in Olive CI without mobius-genai
- Add '# pylint: disable=protected-access' on _default_config test line
(PYLINT W0212 — intentional test access to a pass internals method)
- Add '# noqa: PLC0415' on lazy 'from mobius import build' inside
_run_for_config — import is intentionally deferred to surface a clear
ImportError only when the pass actually runs
- Run 'lintrunner -a' to auto-apply RUFF-FORMAT and FORMAT-JSON patches
on mobius_model_builder.py, test file, and both example configs
- 14/14 tests pass
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
Change all references from 'mobius-genai' to 'mobius-ai': - olive_config.json: extra_dependencies key/value and top-level mapping - mobius_model_builder.py: docstring install snippet and ImportError message - test file: fixture docstring comment Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
lintrunner auto-fixed RUF100 (unused noqa directive) across 15 files. The PLC0415 noqa in mobius_model_builder.py was stale — ruff does not enable PLC0415 in this repo, so the directive was unused. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchuby@noreply.github.com>
| ), | ||
| ), | ||
| "execution_provider": PassConfigParam( | ||
| type_=str, |
There was a problem hiding this comment.
we could create an enum of the supported eps for automatic validation like in
Olive/olive/passes/pytorch/autoawq.py
Line 27 in 8b1957e
unless you think the options might keep growing and it would be hard to keep it in sync across versions
Replace GptqQuantizer (requires auto_gptq) with the built-in OnnxBlockWiseRtnQuantization pass which works out of the box. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
Replace free-form string with StrEnumBase enum matching the pattern from AutoAWQQuantizer.ModelDtype. Supports: default, cpu, cuda, dml, webgpu, trt-rtx, onnx-standard. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
21ab3e2 to
2af889f
Compare
Configs moved to microsoft/olive-recipes per repo convention. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
|
will mobius generate genai_config.json and related files for ort genai? Also, does mobius support customized naming for different component for multi components model? I can see all component models are named as "model.onnx" |
Add 'runtime' config param (default: ort-genai) that generates genai_config.json, tokenizer files, and processor configs alongside ONNX models via write_ort_genai_config(). Set to 'none' to skip. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Justin Chu <justinchu@microsoft.com>
dcce655 to
68ed349
Compare
|
@xiaoyu-work The models are |
|
|
||
| pip install mobius-ai | ||
|
|
||
| See https://github.com/microsoft/mobius |
| ), | ||
| ), | ||
| "execution_provider": PassConfigParam( | ||
| type_=MobiusModelBuilder.MobiusEP, |
There was a problem hiding this comment.
We should not have pass level execution provider user choices.
The Olive engine is running a series of passes and user selects the EP for a given engine run.
There was a problem hiding this comment.
See another relevant comment below _run_for_config
There was a problem hiding this comment.
The execution_provider pass config param has been removed. The pass now uses only the Olive accelerator spec EP, and raises a ValueError if that EP is not supported by mobius. This was addressed in 8efcec5.
Signed-off-by: Justin Chu <justinchuby@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
a834c8a to
8efcec5
Compare
titaiwangms
left a comment
There was a problem hiding this comment.
Nice work! A few suggestions from our experience building the Ministral-3 VLM recipe on top of mobius:
1. Per-component VLM quantization guidance
For VLMs, downstream quantization needs differ per component: decoder → INT4, vision → INT8, embedding → none (ORT has a Gather rank-1 bug with GatherBlockQuantized that prevents embedding quantization). It would be helpful to document how recipe authors should wire per-component quantization after CompositeModelHandler. Even a brief note in the docstring would save people time.
2. context_length default may be too small
write_ort_genai_config() defaults context_length to 4096, but models like Ministral-3 need 262144 for full YaRN RoPE cache. Consider reading max_position_embeddings from the HF config automatically, or exposing it as a pass config parameter.
3. Minimum mobius version check (minor)
If a user has an older mobius without support for newer model types (e.g., Pixtral), they would get confusing errors from mobius.build(). A version check or clearer error message would help.
4. Migration example (nice-to-have)
Existing recipes manually call mobius.build() + write_ort_genai_config(). A brief before/after example showing how to replace that with MobiusModelBuilder pass would help adoption.
Overall this is clean and well-tested — excited to migrate our recipe to use this once it lands!
|
@xiaoyu-work how should we handle per component quantization with Olive? |
|
@xiaoyu-work Thanks for the question! genai_config.json generation: Yes, we've just added support for this. The pass now calls `_write_genai_config()` which generates all ORT GenAI artifacts (genai_config.json, tokenizer files, processor configs) and includes them in the model handler's `additional_files`. This happens automatically when `runtime='ort-genai'` (the default). Multi-component naming: Correct, mobius uses the flat `<component_name>/model.onnx` layout (e.g., `decoder/model.onnx`, `vision_encoder/model.onnx`). This is what ORT GenAI expects. The pass returns a `CompositeModelHandler` with each component's directory as the model path, so the structure is preserved. |
|
@titaiwangms Great feedback! Addressing each point: 1. Per-component VLM quantization guidance 2. context_length / max_position_embeddings 3. Minimum mobius version check 4. Migration example ``` Before: manual mobius.build()from mobius import build After: MobiusModelBuilder passJust put {'type': 'MobiusModelBuilder', 'precision': 'fp32'} in your Olive config!``` Thanks for the thorough review — these notes will help adoption! |
|
@xiaoyu-work @justinchuby Regarding per-component quantization: With `CompositeModelHandler`, you can wire different passes per component by targeting them in the pipeline config. Here's the pattern: ```json Then the Olive engine will:
For selective per-component quantization (e.g., skip embedding), you'd use `nodes_to_exclude` or create separate configs per component. We can document this pattern in the pass docstring to make it clearer. The architecture here is that each pass is component-agnostic; the engine applies it to all components in the composite. For more granular control, recipes can create separate Olive config variants (one per component, or one per quantization strategy). |
Summary
Adds a new Olive pass (
MobiusModelBuilder) that wraps mobiusbuild()to produce ONNX models from HuggingFace model IDs.ONNXModelHandlerCompositeModelHandlerolive_config.jsonasMobiusModelBuilderexamples/gemma4/gemma4_int4_cuda.jsonValidated: Gemma4 INT4 Quantization Pipeline
Successfully tested
MobiusModelBuilder→OnnxBlockWiseRtnQuantizationongoogle/gemma-4-E2B-it:Quantized ops per component
Weight quantization coverage
Output structure (2.8GB total, down from ~5GB fp16)
Pipeline timing
MobiusModelBuilder(fp16 build)OnnxBlockWiseRtnQuantization(int4)