You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/genai/howto/build-model.md
+1-153Lines changed: 1 addition & 153 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,158 +8,6 @@ nav_order: 3
8
8
---
9
9
10
10
# Generate models using Model Builder
11
-
{: .no_toc }
12
11
13
-
* TOC placeholder
14
-
{:toc}
12
+
Refer to [model builder guide](https://github.com/microsoft/onnxruntime-genai/blob/main/src/python/py/models/README.md) for the latest documentation.
15
13
16
-
The model builder greatly accelerates creating optimized and quantized ONNX models that run with the ONNX Runtime generate() API.
17
-
18
-
## Current Support
19
-
The tool currently supports the following model architectures.
20
-
21
-
- Gemma
22
-
- LLaMA
23
-
- Mistral
24
-
- Phi
25
-
26
-
## Installation
27
-
28
-
Model builder is available as an [Olive](https://github.com/microsoft/olive) pass. It is also shipped as part of the onnxruntime-genai Python package. You can also download and run it standalone.
29
-
30
-
In any case, you need to have the following packages installed.
This scenario is where your PyTorch model is already downloaded locally (either in the default Hugging Face cache directory or in a local folder on disk).
This scenario is where your PyTorch model has been customized or finetuned for one of the currently supported model architectures and your model can be loaded in Hugging Face.
This scenario is for when you want to have control over some specific settings. The below example shows how you can pass key-value arguments to `--extra_options`.
To see all available options through `--extra_options`, please use the `help` commands in the `Full Usage` section above.
114
-
115
-
### Config Only
116
-
This scenario is for when you already have your optimized and/or quantized ONNX model and you need to create the config files to run with ONNX Runtime generate() API.
Afterwards, please open the `genai_config.json` file in the output folder and modify the fields as needed for your model. You should store your ONNX model in the output folder as well.
126
-
127
-
### Unit Testing Models
128
-
This scenario is where your PyTorch model is already downloaded locally (either in the default Hugging Face cache directory or in a local folder on disk). If it is not already downloaded locally, here is an example of how you can download it.
129
-
130
-
```
131
-
from transformers import AutoModelForCausalLM, AutoTokenizer
132
-
133
-
model_name = "your_model_name"
134
-
cache_dir = "cache_dir_to_save_hf_files"
135
-
136
-
model = AutoModelForCausalLM.from_pretrained(model_name, cache_dir=cache_dir)
Copy file name to clipboardExpand all lines: docs/genai/index.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,9 +13,11 @@ Run generative AI models with ONNX Runtime.
13
13
14
14
See the source code here: [https://github.com/microsoft/onnxruntime-genai](https://github.com/microsoft/onnxruntime-genai)
15
15
16
-
This library provides the generative AI loop for ONNX models, including inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.
16
+
This library provides the generative AI loop for ONNX models, including tokenization and other pre-processing, inference with ONNX Runtime, logits processing, search and sampling, and KV cache management.
17
17
18
18
Users can call a high level `generate()` method, or run each iteration of the model in a loop, generating one token at a time, and optionally updating generation parameters inside the loop.
19
19
20
20
It has support for greedy/beam search and TopP, TopK sampling to generate token sequences and built-in logits processing like repetition penalties. You can also easily add custom scoring.
21
21
22
+
Other supported features include applying chat templates and structured output (for tool calling)
0 commit comments