Skip to content

Commit 3733e39

Browse files
samuel100MaanavD
andauthored
olive shared cache blog init (#22642)
Added blog post on Olive's shared cache feature. --------- Co-authored-by: Maanav Dalal <maanavdalal@gmail.com>
1 parent 40a53ee commit 3733e39

File tree

5 files changed

+177
-3
lines changed

5 files changed

+177
-3
lines changed
143 KB
Loading

src/routes/blogs/+page.svelte

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
import Phi3OnDeviceImage from '../../images/blogs/phi-3-on-device_blog_thumbnail.png';
1919
import Phi3SmallMediumImage from '../../images/blogs/accelerating-phi-3-medium-thumbnail.png';
2020
import LightGlueImage from '../../images/blogs/lightglue-community-blog.png';
21+
import OliveSharedCache from '../../images/blogs/olive-shared-cache-user-flow.png';
2122
onMount(() => {
2223
anime({
2324
targets: '.border-primary',
@@ -45,6 +46,16 @@
4546
dispatch('switchTab', tab);
4647
}
4748
let featuredblog = [
49+
{
50+
title: 'Enhancing team collaboration during AI model optimization with the Olive Shared Cache',
51+
date: 'October 30th, 2024',
52+
blurb:
53+
"Learn how to use Olive's shared cache to enhance team collaboration when optimizing AI models",
54+
link: 'blogs/olive-shared-cache',
55+
image: OliveSharedCache,
56+
imgalt:
57+
'Team Flow for Olive shared cache'
58+
},
4859
{
4960
title: 'Accelerating LightGlue Inference with ONNX Runtime and TensorRT',
5061
date: 'July 17th, 2024',
@@ -65,6 +76,10 @@
6576
imgalt:
6677
'Image of the different steps of an ML pipeline on a mobile device, running using NimbleEdge and ONNX Runtime.'
6778
},
79+
80+
81+
];
82+
let blogs = [
6883
{
6984
title: 'Background Removal in the Browser Using ONNX Runtime with WebGPU',
7085
date: 'June 12th, 2024',
@@ -75,9 +90,6 @@
7590
imgalt:
7691
'Image of a skateboarder with a sky background, with half of the background being alternating grey and white squares indicating it has been removed.'
7792
},
78-
79-
];
80-
let blogs = [
8193
{
8294
title: 'Phi-3 Small and Medium Models are now Optimized with ONNX Runtime and DirectML',
8395
date: 'May 21th, 2024',
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
title: 'Enhance team collaboration during AI model optimization with the Olive Shared Cache feature'
3+
date: '30th October, 2024'
4+
description: 'Learn how to use the shared cache feature in Olive to enhance team collaboration when optimizing AI models'
5+
keywords: 'GenAI , LLM, ONNXRuntime, ORT, Phi, DirectML, Windows, phi3, phi-3, llama-3.2, ONNX, SLM, edge, gpu'
6+
authors:
7+
[
8+
'Xiaoyu Zhang',
9+
'Devang Patel',
10+
'Sam Kemp'
11+
]
12+
authorsLink:
13+
[
14+
'https://www.linkedin.com/in/xiaoyu-zhang/',
15+
'https://www.linkedin.com/in/devangpatel/',
16+
'https://www.linkedin.com/in/samuel-kemp-a9253724/'
17+
]
18+
image: 'https://iili.io/2nxtC57.png'
19+
imageSquare: 'https://iili.io/2nxtC57.png'
20+
url: 'https://onnxruntime.ai/blogs/olive-shared-cache'
21+
---
22+
23+
24+
## 👋 Introduction
25+
26+
In the ever-evolving realm of machine learning, optimization stands as a crucial pillar for enhancing model performance, reducing latency, and cutting down costs. Enter Olive, a powerful tool designed to streamline the optimization process through its innovative shared cache feature.
27+
28+
Efficiency in machine learning not only relies on the effectiveness of algorithms but also on the efficiency of the processes involved. Olive’s shared cache feature – backed by Azure Storage - embodies this principle by seamlessly allowing intermediate models to be stored and reused within a team, avoiding redundant computations.
29+
30+
This blog post delves into how Olive’s shared cache feature can help you save time and costs, illustrated with practical examples.
31+
32+
### Prerequisites
33+
34+
- An Azure Storage Account. For details on how to create an Azure Storage Account, read [Create an Azure Storage Account](https://learn.microsoft.com/azure/storage/common/storage-account-create?tabs=azure-portal).
35+
- Once you have created your Azure Storage Account, you'll need to create a storage container (a container organizes a set of blobs, similar to a directory in a file system). For more details on how to create a storage container, read [Create a container](https://learn.microsoft.com/azure/storage/blobs/blob-containers-portal#create-a-container).
36+
37+
## 🤝 Team collaboration during optimization process
38+
39+
User A begins the optimization process by employing Olive’s quantize command to optimize the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model using the AWQ algorithm. This step is marked by the following command line execution:
40+
41+
<pre><code>olive quantize \
42+
--model_name_or_path Microsoft/Phi-3-mini-4k-instruct \
43+
--algorithm awq \
44+
--account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
45+
--container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
46+
--log_level 1
47+
</code></pre>
48+
49+
> **Note:**
50+
> - The `--account_name` should be set to your Azure Storage Account name.
51+
> - The `--container_name` should be set to the container name in the Azure Storage Account.
52+
53+
The optimization process generates a log that confirms the cache has been saved in a shared location in Azure:
54+
55+
<div class="m-auto w50">
56+
<img src="./upload-quant-model.png" alt="Uploading a quantized model to the cloud">
57+
58+
<i>Olive log output from User A: The quantized model from User A's workflow is uploaded to the shared cache in the cloud.</i>
59+
</div>
60+
<br/>
61+
62+
This shared cache is a pivotal element, as it stores the optimized model, making it accessible for future use by other users or processes.
63+
64+
### Leveraging the shared cache
65+
66+
User B, another active team member in the optimization project, reaps the benefits of User A’s efforts. By using the same quantize command to optimize the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) with the AWQ algorithm, User B’s process is significantly expedited. The command is identical, and User B leverages the same Azure Storage account and container:
67+
68+
<pre><code>olive quantize \
69+
--model_name_or_path Microsoft/Phi-3-mini-4k-instruct \
70+
--algorithm awq \
71+
--account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
72+
--container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
73+
--log_level 1
74+
</code></pre>
75+
76+
A critical part of this step is the following log output highlights the retrieval of the quantized model from the shared cache rather than re-computing the AWQ quantization.
77+
78+
<div class="m-auto w50">
79+
<img src="./retrieve-quant-model.png" alt="Retrieving a quantized model from the cloud">
80+
81+
<i>Olive log output from User B: The quantized model from User A's workflow is downloaded and consumed in User B's workflow without having to re-compute.</i>
82+
</div>
83+
<br/>
84+
85+
This mechanism not only saves computational resources but also slashes the time required for the optimization. **The shared cache in Azure serves as a repository of pre-optimized models, ready for reuse and thus enhancing efficiency.**
86+
87+
## 🪄 Shared cache + Automatic optimizer
88+
89+
Optimization is not limited to quantization alone. Olive’s Automatic optimizer extends its capabilities by running further pre-processing and optimization tasks in a single command to find the best model in terms of quality and performance. Typical optimization tasks run in Automatic optimizer are:
90+
91+
- Downloading the model from Hugging Face
92+
- Capture the model structure into an ONNX graph and convert the weights into ONNX format.
93+
- Optimize the ONNX graph (for example, fusion, compression)
94+
- Apply specific kernel optimizations for target hardware
95+
- Quantize the model weights
96+
97+
User A leverages Automatic optimizer to optimize the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main) for CPU. The command line instruction for this task is:
98+
99+
<pre><code>olive auto-opt \
100+
--model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
101+
--trust_remote_code \
102+
--output_path optimized-model \
103+
--device cpu \
104+
--provider CPUExecutionProvider \
105+
--precision int4 \
106+
--account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
107+
--container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
108+
--log_level 1
109+
</code></pre>
110+
111+
For each task executed in the automatic optimizer - for example, model download, ONNX Conversion, ONNX graph optimization, Quantization, etc - the intermediate model will be stored in the shared cache for reuse on different hardware targets. For example, if later User B wants to optimize the same model for a different target (say, the GPU of a Windows device) they would execute the following command:
112+
113+
<pre><code>olive auto-opt \
114+
--model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
115+
--trust_remote_code \
116+
--output_path optimized-model \
117+
--device gpu \
118+
--provider DmlExecutionProvider \
119+
--precision int4 \
120+
--account_name &lbrace;AZURE_STORAGE_ACCOUNT&rbrace; \
121+
--container_name &lbrace;STORAGE_CONTAINER_NAME&rbrace; \
122+
--log_level 1
123+
</code></pre>
124+
125+
The common intermediate steps User A's CPU optimization - such as ONNX conversion and ONNX graph optimization - will be reused, which will save User B time and cost.
126+
127+
This underscores Olive’s versatility, not only in optimizing different models but also in applying a variety of algorithms and exporters. The shared cache again plays a critical role by storing these optimized intermediate models for subsequent use.
128+
129+
## ➕ Benefits of the Olive shared cache feature
130+
131+
The examples above showcase Olive’s shared cache as a game-changer in model optimization. Here are the key benefits:
132+
133+
- **Time Efficiency:** By storing optimized models, the shared cache eliminates the need for repetitive optimizations, drastically reducing time consumption.
134+
- **Cost Reduction:** Computational resources are expensive. By minimizing redundant processes, the shared cache cuts down on the associated costs, making machine learning more affordable.
135+
- **Resource Optimization:** Efficient use of computational power leads to better resource management, ensuring that resources are available for other critical tasks.
136+
- **Collaboration:** The shared cache fosters a collaborative environment where different users can benefit from each other’s optimization efforts, promoting knowledge sharing and teamwork.
137+
138+
## Conclusion
139+
140+
By saving and reusing optimized models, Olive’s shared cache feature paves the way for a more efficient, cost-effective, and collaborative environment. As AI continues to grow and evolve, tools like Olive will be instrumental in driving innovation and efficiency.
141+
Whether you are a seasoned data scientist or a newcomer to the field, embracing Olive can significantly enhance your workflow. By reducing the time and costs associated with model optimization, you can focus on what truly matters: developing groundbreaking AI models that push the boundaries of what is possible.
142+
Embark on your optimization journey with Olive today and experience the future of machine learning efficiency.
143+
144+
## ⏭️ Try Olive
145+
146+
To try the quantization and Auto Optimizer commands with the shared-cache feature execute the following pip install:
147+
148+
```bash
149+
pip install olive-ai[auto-opt,shared-cache] autoawq
150+
```
151+
152+
Quantizing a model using the AWQ algorithm requires a CUDA GPU device. If you only have access to a CPU device, and do not have an Azure subscription you can execute the automatic optimizer with a CPU and use local disk as the cache:
153+
154+
<pre><code>olive auto-opt \
155+
--model_name_or_path meta-llama/Llama-3.2-1B-Instruct \
156+
--trust_remote_code \
157+
--output_path optimized-model \
158+
--device cpu \
159+
--provider CPUExecutionProvider \
160+
--precision int4 \
161+
--log_level 1
162+
</code></pre>
897 KB
Loading
913 KB
Loading

0 commit comments

Comments
 (0)