You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/execution-providers/EP-Context-Design.md
+313Lines changed: 313 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -379,4 +379,317 @@ To use the dumped EPContext models with weight sharing enabled, ONNX Runtime inf
379
379
380
380
session1.run(...);
381
381
session2.run(...);
382
+
```
383
+
384
+
## Compile API
385
+
ORT 1.22 introduced an explicit [model compilation API](https://github.com/microsoft/onnxruntime/blob/a5ba2ba3998820dd8da111c90c420479aac7a11e/onnxruntime/python/onnxruntime_inference_collection.py#L680-L709) that enables additional compilation options:
386
+
- Read input model from a file or a buffer.
387
+
- Write output model to a file, a buffer, or an output stream.
388
+
- Provide a callback function to specify the location of each ONNX initializers in the output model.
389
+
- Set compilation flags: "error if no nodes compiled", "error if output file already exists", etc.
390
+
391
+
### Usage example: compiling a model (from file) to an output stream
392
+
```python
393
+
import onnxruntime as ort
394
+
395
+
"""
396
+
Compile a model (from file) to an output stream using a custom write function.
397
+
The custom write function just saves the output model to disk.
398
+
A custom initializer handler stores "large" initializers into an external file.
399
+
"""
400
+
input_model_path ="input_model.onnx"
401
+
output_model_path ="output_model.onnx"
402
+
output_initializer_file_path ="output_model.bin"
403
+
404
+
withopen(output_model_path, "wb") as output_model_fd, \
405
+
open(output_initializer_file_path, "wb") as output_initializer_fd:
406
+
407
+
# Custom function that ORT calls (one or more times) to stream out the model bytes in chunks.
408
+
# This example function simply writes the output model to a file.
409
+
defoutput_model_write_func(buffer: bytes):
410
+
output_model_fd.write(buffer)
411
+
412
+
# Custom function that ORT calls to determine where to store each ONNX initializer in the output model.
413
+
#
414
+
# Note: the `external_info` argument denotes the location of the initializer in the original input model.
415
+
# An implementation may choose to directly return the received `external_info` to use the same external weights.
The above snippet stores ONNX initializers for the output model into a new external file. To keep initializers in the same external file used in the original model,
460
+
return the `external_info` argument from the `output_model_onnx_initializer_handler` function:
# The `external_info` argument denotes the location of the initializer in the original input model (if not None).
469
+
# Return it directly to use the same external initializer file.
470
+
return external_info
471
+
472
+
# ...
473
+
```
474
+
475
+
#### References
476
+
-[Additional Python usage examples in unit tests](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python_compile_api.py)
-[C++ API functions](https://github.com/microsoft/onnxruntime/blob/879ec0392ad5128968440a4e5b5a0bb742494ae5/include/onnxruntime/core/session/onnxruntime_cxx_api.h#L1617-L1623)
479
+
-[C API functions](https://github.com/microsoft/onnxruntime/blob/879ec0392ad5128968440a4e5b5a0bb742494ae5/include/onnxruntime/core/session/onnxruntime_c_api.h#L7751-L7774)
480
+
481
+
### Usage example: cross-compilation with a plugin EP
482
+
By default, ONNX Runtime only allows the use of [plugin EPs](./plugin-ep-libraries.md) that are compatible with real hardware devices discovered by ONNX Runtime.
483
+
To support the creation of compiled models targeted for hardware devices not present on the compiling machine (i.e., cross-compiling), a plugin EP may be allowed
484
+
to create virtual hardware devices that an application can use to compile models.
485
+
486
+
#### Application code
487
+
An application grants a plugin EP library permission to create virtual hardware device by using a library registration name
488
+
that ends in the ".virtual" suffix:
489
+
490
+
```python
491
+
import onnxruntime as ort
492
+
import onnxruntime_ep_contoso_ai as contoso_ep
493
+
494
+
# An application uses a registration name that ends in ".virtual" to signal that virtual devices are allowed.
A plugin EP library determines if the creation of virtual devices is allowed by checking if the "allow_virtual_devices" environment configuration entry
523
+
is set to "1". The following snippet from a [reference EP implementation](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu/ep_lib_entry.cc) shows how a plugin EP library could check environment configuration entries within the library's
-[Reference example plugin EP with virtual GPU](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu)
644
+
-[OrtEpApi::GetEnvConfigEntries C API function](https://github.com/microsoft/onnxruntime/blob/990ba5f0c3e0c8735fec8bf89dd11953224a9c03/include/onnxruntime/core/session/onnxruntime_ep_c_api.h#L1431-L1446)
645
+
-[Ort::GetEnvConfigEntries C++ API function](https://github.com/microsoft/onnxruntime/blob/990ba5f0c3e0c8735fec8bf89dd11953224a9c03/include/onnxruntime/core/session/onnxruntime_cxx_api.h#L3531-L3532)
646
+
-[Plugin EP library documentation](./plugin-ep-libraries.md)
647
+
-[Additional Python usage examples in unit tests](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python_compile_api.py)
0 commit comments