|
| 1 | +--- |
| 2 | +title: 'Enhance team collaboration during AI model optimization with the Olive Shared Cache feature' |
| 3 | +date: '30th October, 2024' |
| 4 | +description: 'Learn how to use the shared cache feature in Olive to enhance team collaboration when optimizing AI models' |
| 5 | +keywords: 'GenAI , LLM, ONNXRuntime, ORT, Phi, DirectML, Windows, phi3, phi-3, llama-3.2, ONNX, SLM, edge, gpu' |
| 6 | +authors: |
| 7 | + [ |
| 8 | + 'Xiaoyu Zhang', |
| 9 | + 'Devang Patel', |
| 10 | + 'Sam Kemp' |
| 11 | + ] |
| 12 | +authorsLink: |
| 13 | + [ |
| 14 | + 'https://www.linkedin.com/in/xiaoyu-zhang/', |
| 15 | + 'https://www.linkedin.com/in/devangpatel/', |
| 16 | + 'https://www.linkedin.com/in/samuel-kemp-a9253724/' |
| 17 | + ] |
| 18 | +image: 'https://iili.io/2nxtC57.png' |
| 19 | +imageSquare: 'https://iili.io/2nxtC57.png' |
| 20 | +url: 'https://onnxruntime.ai/blogs/olive-shared-cache' |
| 21 | +--- |
| 22 | + |
| 23 | + |
| 24 | +## 👋 Introduction |
| 25 | + |
| 26 | +In the ever-evolving realm of machine learning, optimization stands as a crucial pillar for enhancing model performance, reducing latency, and cutting down costs. Enter Olive, a powerful tool designed to streamline the optimization process through its innovative shared cache feature. |
| 27 | + |
| 28 | +Efficiency in machine learning not only relies on the effectiveness of algorithms but also on the efficiency of the processes involved. Olive’s shared cache feature – backed by Azure Storage - embodies this principle by seamlessly allowing intermediate models to be stored and reused within a team, avoiding redundant computations. |
| 29 | + |
| 30 | +This blog post delves into how Olive’s shared cache feature can help you save time and costs, illustrated with practical examples. |
| 31 | + |
| 32 | +### Prerequisites |
| 33 | + |
| 34 | +- An Azure Storage Account. For details on how to create an Azure Storage Account, read [Create an Azure Storage Account](https://learn.microsoft.com/azure/storage/common/storage-account-create?tabs=azure-portal). |
| 35 | +- Once you have created your Azure Storage Account, you'll need to create a storage container (a container organizes a set of blobs, similar to a directory in a file system). For more details on how to create a storage container, read [Create a container](https://learn.microsoft.com/azure/storage/blobs/blob-containers-portal#create-a-container). |
| 36 | + |
| 37 | +## 🤝 Team collaboration during optimization process |
| 38 | + |
| 39 | +User A begins the optimization process by employing Olive’s quantize command to optimize the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) model using the AWQ algorithm. This step is marked by the following command line execution: |
| 40 | + |
| 41 | +<pre><code>olive quantize \ |
| 42 | + --model_name_or_path Microsoft/Phi-3-mini-4k-instruct \ |
| 43 | + --algorithm awq \ |
| 44 | + --account_name {AZURE_STORAGE_ACCOUNT} \ |
| 45 | + --container_name {STORAGE_CONTAINER_NAME} \ |
| 46 | + --log_level 1 |
| 47 | +</code></pre> |
| 48 | + |
| 49 | +> **Note:** |
| 50 | +> - The `--account_name` should be set to your Azure Storage Account name. |
| 51 | +> - The `--container_name` should be set to the container name in the Azure Storage Account. |
| 52 | + |
| 53 | +The optimization process generates a log that confirms the cache has been saved in a shared location in Azure: |
| 54 | + |
| 55 | +<div class="m-auto w50"> |
| 56 | +<img src="./upload-quant-model.png" alt="Uploading a quantized model to the cloud"> |
| 57 | + |
| 58 | +<i>Olive log output from User A: The quantized model from User A's workflow is uploaded to the shared cache in the cloud.</i> |
| 59 | +</div> |
| 60 | +<br/> |
| 61 | + |
| 62 | +This shared cache is a pivotal element, as it stores the optimized model, making it accessible for future use by other users or processes. |
| 63 | + |
| 64 | +### Leveraging the shared cache |
| 65 | + |
| 66 | +User B, another active team member in the optimization project, reaps the benefits of User A’s efforts. By using the same quantize command to optimize the [Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) with the AWQ algorithm, User B’s process is significantly expedited. The command is identical, and User B leverages the same Azure Storage account and container: |
| 67 | + |
| 68 | +<pre><code>olive quantize \ |
| 69 | + --model_name_or_path Microsoft/Phi-3-mini-4k-instruct \ |
| 70 | + --algorithm awq \ |
| 71 | + --account_name {AZURE_STORAGE_ACCOUNT} \ |
| 72 | + --container_name {STORAGE_CONTAINER_NAME} \ |
| 73 | + --log_level 1 |
| 74 | +</code></pre> |
| 75 | + |
| 76 | +A critical part of this step is the following log output highlights the retrieval of the quantized model from the shared cache rather than re-computing the AWQ quantization. |
| 77 | + |
| 78 | +<div class="m-auto w50"> |
| 79 | +<img src="./retrieve-quant-model.png" alt="Retrieving a quantized model from the cloud"> |
| 80 | + |
| 81 | +<i>Olive log output from User B: The quantized model from User A's workflow is downloaded and consumed in User B's workflow without having to re-compute.</i> |
| 82 | +</div> |
| 83 | +<br/> |
| 84 | + |
| 85 | +This mechanism not only saves computational resources but also slashes the time required for the optimization. **The shared cache in Azure serves as a repository of pre-optimized models, ready for reuse and thus enhancing efficiency.** |
| 86 | + |
| 87 | +## 🪄 Shared cache + Automatic optimizer |
| 88 | + |
| 89 | +Optimization is not limited to quantization alone. Olive’s Automatic optimizer extends its capabilities by running further pre-processing and optimization tasks in a single command to find the best model in terms of quality and performance. Typical optimization tasks run in Automatic optimizer are: |
| 90 | + |
| 91 | +- Downloading the model from Hugging Face |
| 92 | +- Capture the model structure into an ONNX graph and convert the weights into ONNX format. |
| 93 | +- Optimize the ONNX graph (for example, fusion, compression) |
| 94 | +- Apply specific kernel optimizations for target hardware |
| 95 | +- Quantize the model weights |
| 96 | + |
| 97 | +User A leverages Automatic optimizer to optimize the [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct/tree/main) for CPU. The command line instruction for this task is: |
| 98 | + |
| 99 | +<pre><code>olive auto-opt \ |
| 100 | + --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \ |
| 101 | + --trust_remote_code \ |
| 102 | + --output_path optimized-model \ |
| 103 | + --device cpu \ |
| 104 | + --provider CPUExecutionProvider \ |
| 105 | + --precision int4 \ |
| 106 | + --account_name {AZURE_STORAGE_ACCOUNT} \ |
| 107 | + --container_name {STORAGE_CONTAINER_NAME} \ |
| 108 | + --log_level 1 |
| 109 | +</code></pre> |
| 110 | + |
| 111 | +For each task executed in the automatic optimizer - for example, model download, ONNX Conversion, ONNX graph optimization, Quantization, etc - the intermediate model will be stored in the shared cache for reuse on different hardware targets. For example, if later User B wants to optimize the same model for a different target (say, the GPU of a Windows device) they would execute the following command: |
| 112 | + |
| 113 | +<pre><code>olive auto-opt \ |
| 114 | + --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \ |
| 115 | + --trust_remote_code \ |
| 116 | + --output_path optimized-model \ |
| 117 | + --device gpu \ |
| 118 | + --provider DmlExecutionProvider \ |
| 119 | + --precision int4 \ |
| 120 | + --account_name {AZURE_STORAGE_ACCOUNT} \ |
| 121 | + --container_name {STORAGE_CONTAINER_NAME} \ |
| 122 | + --log_level 1 |
| 123 | +</code></pre> |
| 124 | + |
| 125 | +The common intermediate steps User A's CPU optimization - such as ONNX conversion and ONNX graph optimization - will be reused, which will save User B time and cost. |
| 126 | + |
| 127 | +This underscores Olive’s versatility, not only in optimizing different models but also in applying a variety of algorithms and exporters. The shared cache again plays a critical role by storing these optimized intermediate models for subsequent use. |
| 128 | + |
| 129 | +## ➕ Benefits of the Olive shared cache feature |
| 130 | + |
| 131 | +The examples above showcase Olive’s shared cache as a game-changer in model optimization. Here are the key benefits: |
| 132 | + |
| 133 | +- **Time Efficiency:** By storing optimized models, the shared cache eliminates the need for repetitive optimizations, drastically reducing time consumption. |
| 134 | +- **Cost Reduction:** Computational resources are expensive. By minimizing redundant processes, the shared cache cuts down on the associated costs, making machine learning more affordable. |
| 135 | +- **Resource Optimization:** Efficient use of computational power leads to better resource management, ensuring that resources are available for other critical tasks. |
| 136 | +- **Collaboration:** The shared cache fosters a collaborative environment where different users can benefit from each other’s optimization efforts, promoting knowledge sharing and teamwork. |
| 137 | + |
| 138 | +## Conclusion |
| 139 | + |
| 140 | +By saving and reusing optimized models, Olive’s shared cache feature paves the way for a more efficient, cost-effective, and collaborative environment. As AI continues to grow and evolve, tools like Olive will be instrumental in driving innovation and efficiency. |
| 141 | +Whether you are a seasoned data scientist or a newcomer to the field, embracing Olive can significantly enhance your workflow. By reducing the time and costs associated with model optimization, you can focus on what truly matters: developing groundbreaking AI models that push the boundaries of what is possible. |
| 142 | +Embark on your optimization journey with Olive today and experience the future of machine learning efficiency. |
| 143 | + |
| 144 | +## ⏭️ Try Olive |
| 145 | + |
| 146 | +To try the quantization and Auto Optimizer commands with the shared-cache feature execute the following pip install: |
| 147 | + |
| 148 | +```bash |
| 149 | +pip install olive-ai[auto-opt,shared-cache] autoawq |
| 150 | +``` |
| 151 | + |
| 152 | +Quantizing a model using the AWQ algorithm requires a CUDA GPU device. If you only have access to a CPU device, and do not have an Azure subscription you can execute the automatic optimizer with a CPU and use local disk as the cache: |
| 153 | + |
| 154 | +<pre><code>olive auto-opt \ |
| 155 | + --model_name_or_path meta-llama/Llama-3.2-1B-Instruct \ |
| 156 | + --trust_remote_code \ |
| 157 | + --output_path optimized-model \ |
| 158 | + --device cpu \ |
| 159 | + --provider CPUExecutionProvider \ |
| 160 | + --precision int4 \ |
| 161 | + --log_level 1 |
| 162 | +</code></pre> |
0 commit comments