Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ session = rt.InferenceSession("<model_path>", sess_options)

OrtSession* session;
const ORTCHAR_T* model_path = ORT_TSTR("model_path");
g_ort->CreateSession(env, model_path, session_option, &session);
g_ort->CreateSession(env, model_path, session_options, &session);
```

#### C# API Example
Expand Down
2 changes: 1 addition & 1 deletion docs/performance/model-optimizations/ort-format-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ python -m onnxruntime.tools.convert_onnx_models_to_ort <onnx model file or dir>

where:

* onnx mode file or dir is a path to .onnx file or directory containing one or more .onnx models
* onnx model file or dir is a path to .onnx file or directory containing one or more .onnx models

The current optional arguments are available by running the script with the `--help` argument.
Supported arguments and defaults differ slightly across ONNX Runtime versions.
Expand Down
6 changes: 3 additions & 3 deletions docs/performance/model-optimizations/quantization.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,7 +233,7 @@ We provide two end-to end examples: [Yolo V3](https://github.com/microsoft/onnxr

## Quantize to Int4/UInt4

ONNX Runtime can quantize certain operators in a model to 4 bit integer types. Block-wise weight-only quantizaiton is applied to the operators. The supported op types are:
ONNX Runtime can quantize certain operators in a model to 4 bit integer types. Block-wise weight-only quantization is applied to the operators. The supported op types are:
- [MatMul](https://github.com/onnx/onnx/blob/main/docs/Operators.md#matmul):
- The node is quantized only if the input `B` is constant
- support QOperator or QDQ format.
Expand Down Expand Up @@ -263,7 +263,7 @@ model_int4_path="path/to/save/quantized/model.onnx"

quant_config = matmul_4bits_quantizer.DefaultWeightOnlyQuantConfig(
block_size=128, # 2's exponential and >= 16
is_symmetric=True, # if true, quantize to Int4. otherwsie, quantize to uint4.
is_symmetric=True, # if true, quantize to Int4. otherwise, quantize to uint4.
accuracy_level=4, # used by MatMulNbits, see https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#attributes-35
quant_format=quant_utils.QuantFormat.QOperator,
op_types_to_quantize=("MatMul","Gather"), # specify which op types to quantize
Expand All @@ -272,7 +272,7 @@ quant_config = matmul_4bits_quantizer.DefaultWeightOnlyQuantConfig(
model = quant_utils.load_model_with_shape_infer(Path(model_fp32_path))
quant = matmul_4bits_quantizer.MatMul4BitsQuantizer(
model,
nodes_to_exclude=None, # specify a list of nodes to exclude from quantizaiton
nodes_to_exclude=None, # specify a list of nodes to exclude from quantization
nodes_to_include=None, # specify a list of nodes to force include from quantization
algo_config=quant_config,)
quant.process()
Expand Down
Loading