16 May 05:59

shaahji

de1da79

Olive-ai 0.9.1 Latest

Latest

Minor release to fix following issues

OpenVINO Encapsulation pad_token_id fix (#1847)
Add support for Nvidia TensorRT RTX execution provider in Olive (#1852)
Basic support for ONNX auto EP selection introduced in onnxruntime v1.22.0 (#1854, #1863)
Add Nvidia TensorRT-RTX Olive recipe for vit, clip and bert examples (#1858)
gate optimum[openvino] version to <=1.24 (#1864)

Assets 3

12 May 16:43

shaahji

v0.9.0

4e2f0ec

Olive-ai 0.9.0

Feature Updates

Implement lm-eval-harness based LLM quality evaluator for ONNX GenAI models #1720
Update minimum supported target opset for ONNX to 17. #1741
QDQ support for ModelBuilder pass #1736
Refactor OnnxOpVersionConversion to conditionally use onnxscript version converter #1784
HQQ Quantizer Pass #1799, #1835
Introducing global definitions for Precision & PrecisionBits #1808
Improvements in PeepholeHoleOptimizer #1697, #1698

New Passes

OnnxScriptFusion: ONNX script fusion
OpenVINOEncapsulation, OpenVINOReshape, OpenVINOIoUpdate: OpenVINO encapsulation #1754
TrtMatMulToConvTransform: Convert non-4D MatMul to Transpose-Conv-Transpose sequence
OpenVINOOptimumConversion: Add optimum Intel® pass for converting a Huggingface Model to an OpenVINO Model
Graph Surgeries
- MatMulAddGemm: Graph surgery to transform Add Op followed by Matmul as Gemm op
- PowReduceSumPowDiv2LpNorm: Graph surgery to merge Pow ReduceSum Pow Div pattern to L2Norm
OnnxHqqQuantization: Implements 4-bit HQQ quantization
VitisAIAddMetaData: Adds metadata to an ONNX model based on specified model attributes.

New/Updated Examples

Alibaba-NLP/gte #1695
DeepSeek
- OpenVINO #1786
Google BERT
- QDQ #1701, #1718, #1733, #1797, #1817
- QNN #1764
- VitisAI #1728
Google VIT
- QDQ #1701, #1733, #1797, #1817
- QNN #1701, #1749
- VitisAI #1728
- OpenVINO #1757, #1767
Intel BERT
- QDQ #1797, #1817
- QNN #1749
- OpenVINO #1767, #1768, #1777, #1822
Laion Clip
- QDQ #1701, #1733, #1797
- QNN #1701, #1749
- VitisAI #1728
- OpenVINO #1793
Llama3
- OpenVINO #1786
Meta Llama3
- QDQ #1707
OpenAI Clip (16 and 32)
- QDQ #1701, #1733, #1797, #1817
- QNN #1701, #1764
- VitisAI #1728
- OpenVINO #1793
Phi3.5
- QDQ #1707, #1733, #1817
- VitisAI #1707, #1728
- OpenVINO #1786
Phi4
- OpenVINO #1828
Qwen
- QNN #1699
- OpenVINO #1786, #1828
Resnet50
- QDQ #1701, #1749, #1817
- QNN #1701, #1749
- OpenVINO #1757, #1767, #1786
Sentence Transformers CLIP
- QDQ #1797
- QNN #1694, #1797
Stable Diffusion
- QDQ #1730

Deprecated Examples

Mobilenet QNN #1743
Inception #1743

Deprecated Passes

InsertBeamSearchOp #1805

Assets 3

17 Mar 22:14

jambayk

v0.8.0

6ab9d8b

Olive-ai 0.8.0

New Features (Passes)

QuaRot performs offline weight rotation
SpinQuant performs offline weight rotation
StaticLLM converts dynamic shaped llm into a static shaped llm for NPUs.
GraphSurgeries applies surgeries to ONNX model. Surgeries are modular and individually configurable.
LoHa, LoKr and DoRA finetuning
OnnxQuantizationPreprocess applies quantization preprocessing.
EPContextBinaryGenerator creates EP specific context binary onnx models.
ComposeOnnxModels composes split onnx models.
OnnxIOFloat16ToFloat32 replaced with more generic OnnxIODataTypeConverter

Command Line Interface

New command line tools have been added and existing tools have been improved.

generate_config_file option to save the workflow config file.
extract-adapters command to extract multiple adapters from a PyTorch model.
Simplied quantize command

Improvements

Better output model structure for workflow and CLI runs.
- New no_artifacts options in workflow config to disable saving run artifacts such as footprints.
Hf data preprocessing:
- Dataset is truncated if max_samples is set.
- Empty text are filtered.
- padding_side is configurable and defaults to "right".
SplitModel pass keeps QDQ nodes together in the same split.
OnnxPeepholeOptimizer: constant folding + onnxoptimizer added.
CaptureSplitInfo: Separate split for memory intensive module.
OnnxConversion:
- Dynamic shapes for dynamo export.
- optimize option to perform constant folding and redundancies elimination on dynamo exported model.
GPTQ: Default wikitest calibration dataset. Patch to support newer versions of transformers.
MatMulNBitsToQDQ: nodes_to_exclude option.
SplitModel: split_assignments option to provide custom split assignments.
CaptureSplitInfo: block_to_split can be a single block (str) or multiple blocks (list).
OnnxMatMul4Quantizer: Support onnxruntime 1.18+
OnnxQuantization:
- Support onnxruntime 1.18+.
- op_types_to_exclude option.
- LLMAugmentedDataLoader augments the calibration data for llms with kv cache and other missing inputs.
New document theme and organization.
Reimplement search logic to include passes in search space.

Examples:

New QNN EP examples:
- SLMs:
  - Phi-3.5
  - Deepseek R1 Distill
  - Llama 3.2
- MobileNet
- ResNet
- CLIP VIT
- BAAI/bge-small-en-v1.5
- Table Transformer Detection
- adetailer
Deepseek R1 Distill Finetuning
timm MobileNet

Assets 3

14 Nov 19:39

jambayk

v0.7.1.1

a2d32aa

Olive-ai 0.7.1.1

Same as 0.7.1 with updated dependencies for nvmo extra and NVIDIA TensorRT Model Optimizer example doc.

Refer 0.7.1 Release Notes for other details.

Assets 3

12 Nov 20:57

jambayk

v.0.7.1

9885cee

Olive-ai 0.7.1

Command Line Interface

New command line tools have been added and existing tools have been improved.

olive --help works as expected.
auto-opt:
- The command chooses a set of passes compatible with the provided model type, precision and accelerator information.
- New options to split a model, either using --num-splits or --cost-model.

Improvements

ExtractAdapters:
- Support lora adapter nodes in Stable Diffusion unet or text-embedding models.
- Default initializers for quantized adapter to run the model without adapter inputs.
GPTQ:
- Avoid saving unused bias weights (all zeros).
- Set use_exllama to False by default to allow exporting and fine-tuning external GPTQ checkpoints.
AWQ: Patch autoawq to run quantization on newer transformers versions.
Atomic SharedCache operations
New CaptureSplitInfo and Split passes to split models into components. Number of splits can be user provided or inferred from a cost model.
disable_search is deprecated from pass configuration in an olive workflow config.
OrtSessionParamsTuning redone to use olive search features.
OrtModelOptimizer renamed to OrtPeepholeOptimizer and some bug fixes.

Examples:

Stable Diffusion: New MultiLora Example
Phi3: New int quantization example using nvidia-modelopt

Assets 3

16 Oct 23:00

shaahji

v0.7.0

2e77d71

Olive-ai 0.7.0

Command Line Interface (CLI)

Introducing new command line interface for Olive with support to execute well-defined concrete workflows without user having to ever create or edit a config manually. CLI workflow commands can be chained i.e. output of one execution can be fed as input to the next, to facilitate ease of operations for the entire pipeline. Below is a list of few CLI workflow commands -

finetune - Fine-tune a model on a dataset using peft and optimize the model for ONNX Runtime
capture-onnx-graph: Capture ONNX graph for a Huggingface model.
auto-opt: Automatically optimize a model for performance.
quantize: Quantize model using given algorithm for desired precision and target.
tune-session-params: Automatically tune the session parameters for a ONNX model.
generate-adapter: Generate ONNX model with adapters as inputs.

Improvements

Added support for yaml based workflow config
Streamlined DataConfig management
Simplified workflow configuration
Added shared cache support for intermediate models and supporting data files
Added QuaRoT quantization pass for PyTorch models
Added support to evaluate generative PyTorch models
Streamlined support for user-defined evaluators
Enabled use of llm-evaluation-harness for generative model evaluations

Examples

Llama
- Updated multi-lora example to use ORT genreate() API
- Updated to demonstrate use of shared cache
Phi3
- Updated to demonstrate evaluation using lm-eval harness
- Updated to showcase search across three different QLoRA ranks
- Added Vision tutorial

Assets 3

11 Jun 06:52

trajepl

v0.6.2

331f519

Olive-ai 0.6.2

Workflow config

Support YAML files as workflow config file. #1191
Workflow id feature is a prerequisite for running workflow on a remote vm feature. By adding this feature #1179 :
- Cache dir will become <cache_dir>/<workflow_id>
- OLive config will be automatically saved to cache dir.
- User can specify workflow_id in config file.
- The default workflow_id is default_workflow.

Passes (optimization techniques)

Accept SNPE DLC model for qnn context binnary generator #1188

Data

Remove params_config, components/component_args. All components specific parameters are now grouped in four separate objects: #1187
- load_dataset_config
- pre_process_data_config
- post_process_data_config
- dataloader_config

Docs

Add olive workflow schema to doc website. This schema file can be used in IDEs when writing workflow configs. #1190

Assets 3

30 May 06:49

trajepl

v0.6.1

dde5e02

Olive-ai 0.6.1

Example

Phi3 AzureML example. #1171

Passes (optimization techniques)

Pytorch
- OnnxQuantization : Complete the qnn-ep related config items to support new features from onnxruntime-1.18

Data

Deprecate unused field DataComponentConfig::name #1178

Assets 3

15 May 11:13

trajepl

v0.6.0

868d12d

Olive-ai 0.6.0

Examples

The following examples are added:

Add LLM sample for DirectML #1082 #1106
- This adds an LLM sample for DirectML that can convert and quantize a bunch of LLMs from HuggingFace. The Dolly, Phi and LLaMA 2 folders were removed and replaced with a more generic LLM example that supports a large number of LLMs, including but not limited to Phi-2, Mistral, LLaMA 2
- Add Gemma to DML LLM sample #1138
Llama2 optimization with multi-ep managed env #1087
Llama2: Multi-lora example notebook, Custom generator #1114
Search Optimal optimization among multiple EPs #1092

Olive CLI updates

Previous commands python -m olive.workflows.run and python -m olive.platform_sdk.qualcomm.configure are deprecated. Use olive run or python -m olive instead. #1129

Passes (optimization techniques)

Pytorch
- AutoAWQQuantizer Enable AutoAwq in Olive and provides the capbility for onnx conversion #1080
- SliceGPT: Add support for generic data sets to SliceGPT pass #1145
ONNXRuntime
- ExtractAdapters pass supports int4 quantized models and expose the external data config options to users. #1083
- ModelBuilder: Converts a Huggingface/AML generative PyTorch model to ONNX model using the ONNX Runtime Generative AI >= 0.2.0. #1089 #1073 #1110 #1112 #1118 #1130 #1131 #1141 #1146 #1147 #1154
- OnnxFloatToFloat16: Use ort float16 converter #1132
- NVModelOptQuantization Quantize ONNX model with Nvidia-ModelOpt. #1135
- OnnxIOFloat16ToFloat32: Converts float16 model inputs/outputs to float32. #1149
- [Vitis AI] Make Vitis AI techniques compatible with ORT 1.18 #1140

Data Config

Remove name ambiguity in dataset configuration #1111
Remove HfConfig::dataset references in examples and tests #1113

Engine

Add aml deployment packaging. #1090

System

Make the accelerator EP optional in olive systems for non-onnx pass. #1072

Data

Add AML resource support for data configs.
Add audio classification data preprocess function.

Model

Provide build-in kv_cache_config for generative model's io_config #1121
MLFlow transfrormers models to huggingface format which can be consumed by the passes which need huggingface format. #1150

Metrics

Dependencies:

Support onnxruntime 1.17.3

Issues

Fix code scanning issues. #1078 #1081 #1084 #1085 #1091 #1094 #1103 #1104 #1107 #1126 #1124 #1128

Assets 3

11 Apr 05:56

trajepl

v0.5.2

1d268da

Olive-ai 0.5.2

Examples

The following examples are added

Phi2 SliceGPT example #1052
Phi2 Genai example. #1061
Llama ExtractAdapters example. #1064

Passes (optimization techniques)

SliceGPT: SliceGPT is post-training sparsification scheme that makes transformer networks smaller by applying orthogonal transformations to each transformer layer that reduces the model size by slicing off the least-significant rows and columns of the weight matrices. This results in speedups and a reduced memory footprint.
ExtractAdapters: Extracts the lora adapters (float or static quantized) weights and saves them in a separate file.

Engine

Simplify the engine config

Fix

GenAIModelExporter: In windows, the cache_dir of genai model exporter will exceed 260.

Assets 3

Releases: microsoft/Olive

Olive-ai 0.9.1

Minor release to fix following issues

Uh oh!

Olive-ai 0.9.0

Feature Updates

New Passes

New/Updated Examples

Deprecated Examples

Deprecated Passes

Uh oh!

Olive-ai 0.8.0

New Features (Passes)

Command Line Interface

Improvements

Examples:

Uh oh!

Olive-ai 0.7.1.1

Uh oh!

Olive-ai 0.7.1

Command Line Interface

Improvements

Examples:

Uh oh!

Olive-ai 0.7.0

Command Line Interface (CLI)

Improvements

Examples

Uh oh!

Olive-ai 0.6.2

Workflow config

Passes (optimization techniques)

Data

Docs

Uh oh!

Olive-ai 0.6.1

Example

Passes (optimization techniques)

Data

Uh oh!

Olive-ai 0.6.0

Examples

Olive CLI updates

Passes (optimization techniques)

Data Config

Engine

System

Data

Model

Metrics

Dependencies:

Issues

Uh oh!

Olive-ai 0.5.2

Examples

Passes (optimization techniques)

Engine

Fix

Uh oh!