Skip to content

Commit 33092ab

Browse files
Add documentation for compile API (#27114)
### Description Preview: https://adrianlizarraga.github.io/onnxruntime/docs/execution-providers/EP-Context-Design.html#compile-api Adds documentation for the ORT compile API. Includes the following examples: - Compiling to an output stream with custom function that allows an application to specify where each initializer is stored. - Cross-compiling with plugin EPs - EPContext weight sharing with plugin EPs ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->
1 parent 308bd2e commit 33092ab

File tree

2 files changed

+333
-1
lines changed

2 files changed

+333
-1
lines changed

docs/execution-providers/EP-Context-Design.md

Lines changed: 318 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,4 +379,321 @@ To use the dumped EPContext models with weight sharing enabled, ONNX Runtime inf
379379
380380
session1.run(...);
381381
session2.run(...);
382-
```
382+
```
383+
384+
## Compile API
385+
ORT 1.22 introduced an explicit [model compilation API](https://github.com/microsoft/onnxruntime/blob/a5ba2ba3998820dd8da111c90c420479aac7a11e/onnxruntime/python/onnxruntime_inference_collection.py#L680-L709) that enables additional compilation options:
386+
- Read input model from a file or a buffer.
387+
- Write output model to a file, a buffer, or an output stream.
388+
- Provide a callback function to specify the location of each ONNX initializer in the output model.
389+
- Set compilation flags: "error if no nodes compiled", "error if output file already exists", etc.
390+
391+
### Usage example: compiling a model (from file) to an output stream
392+
```python
393+
import onnxruntime as ort
394+
395+
"""
396+
Compile a model (from file) to an output stream using a custom write function.
397+
The custom write function just saves the output model to disk.
398+
A custom initializer handler stores "large" initializers into an external file.
399+
"""
400+
input_model_path = "input_model.onnx"
401+
output_model_path = "output_model.onnx"
402+
output_initializer_file_path = "output_model.bin"
403+
404+
with open(output_model_path, "wb") as output_model_fd, \
405+
open(output_initializer_file_path, "wb") as output_initializer_fd:
406+
407+
# Custom function that ORT calls (one or more times) to stream out the model bytes in chunks.
408+
# This example function simply writes the output model to a file.
409+
def output_model_write_func(buffer: bytes):
410+
output_model_fd.write(buffer)
411+
412+
# Custom function that ORT calls to determine where to store each ONNX initializer in the output model.
413+
#
414+
# Note: the `external_info` argument denotes the location of the initializer in the original input model.
415+
# An implementation may choose to directly return the received `external_info` to use the same external weights.
416+
def output_model_onnx_initializer_handler(
417+
initializer_name: str,
418+
initializer_value: ort.OrtValue,
419+
external_info: ort.OrtExternalInitializerInfo | None,
420+
) -> ort.OrtExternalInitializerInfo | None:
421+
byte_size = initializer_value.tensor_size_in_bytes()
422+
423+
if byte_size < 64:
424+
return None # Store small initializer within output model.
425+
426+
# Else, write the initializer to a new external file and return its location to ORT
427+
value_np = initializer_value.numpy()
428+
file_offset = output_initializer_fd.tell()
429+
output_initializer_fd.write(value_np.tobytes())
430+
return ort.OrtExternalInitializerInfo(output_initializer_file_path, file_offset, byte_size)
431+
432+
session_options = ort.SessionOptions()
433+
434+
# Set the EP to use in this session.
435+
#
436+
# Example for plugin EP:
437+
# ep_devices = ort.get_ep_devices()
438+
# selected_ep_device = next((ep_device for ep_device in ep_devices if ep_device.ep_name == "SomeEp"), None)
439+
#
440+
# ep_options = {}
441+
# session_options.add_provider_for_devices([selected_ep_device], ep_options)
442+
#
443+
# Example for legacy "provider-bridge" EP:
444+
# ep_options = {}
445+
# session_options.add_provider("SomeEp", ep_options)
446+
447+
# Compile the model
448+
model_compiler = ort.ModelCompiler(
449+
session_options,
450+
input_model_path,
451+
embed_compiled_data_into_model=True,
452+
get_initializer_location_func=output_model_onnx_initializer_handler,
453+
)
454+
model_compiler.compile_to_stream(output_model_write_func)
455+
456+
assert os.path.exists(output_model_path) == True
457+
```
458+
459+
The above snippet stores ONNX initializers for the output model into a new external file. To keep initializers in the same external file used in the original model,
460+
return the `external_info` argument from the `output_model_onnx_initializer_handler` function:
461+
462+
```python
463+
def output_model_onnx_initializer_handler(
464+
initializer_name: str,
465+
initializer_value: ort.OrtValue,
466+
external_info: ort.OrtExternalInitializerInfo | None,
467+
) -> ort.OrtExternalInitializerInfo | None:
468+
# The `external_info` argument denotes the location of the initializer in the original input model (if not None).
469+
# Return it directly to use the same external initializer file.
470+
return external_info
471+
472+
# ...
473+
```
474+
475+
#### References
476+
- [Additional Python usage examples in unit tests](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python_compile_api.py)
477+
- [Python ModelCompiler class](https://github.com/microsoft/onnxruntime/blob/a5ba2ba3998820dd8da111c90c420479aac7a11e/onnxruntime/python/onnxruntime_inference_collection.py#L680-L709)
478+
- [C++ API functions](https://github.com/microsoft/onnxruntime/blob/879ec0392ad5128968440a4e5b5a0bb742494ae5/include/onnxruntime/core/session/onnxruntime_cxx_api.h#L1617-L1623)
479+
- [C API functions](https://github.com/microsoft/onnxruntime/blob/879ec0392ad5128968440a4e5b5a0bb742494ae5/include/onnxruntime/core/session/onnxruntime_c_api.h#L7751-L7774)
480+
481+
### Usage example: cross-compilation with a plugin EP
482+
By default, ONNX Runtime only allows the use of [plugin EPs](./plugin-ep-libraries.md) that are compatible with real hardware devices discovered by ONNX Runtime.
483+
To support the creation of compiled models targeted for hardware devices not present on the compiling machine (i.e., cross-compiling), a plugin EP may be allowed
484+
to create virtual hardware devices that an application can use to compile models.
485+
486+
#### Application code
487+
An application grants a plugin EP library permission to create virtual hardware device by using a library registration name
488+
that ends in the ".virtual" suffix. A virtual hardware device created by an EP will have the metadata entry "is_virtual" set to "1".
489+
490+
```python
491+
import onnxruntime as ort
492+
import onnxruntime_ep_contoso_ai as contoso_ep
493+
494+
# An application uses a registration name that ends in ".virtual" to signal that virtual devices are allowed.
495+
ep_lib_registration_name = "contoso_ep_lib.virtual"
496+
ort.register_execution_provider_library(ep_lib_registration_name, contoso_ep.get_library_path())
497+
498+
# Set the EP to use for compilation
499+
ep_name = contoso_ep.get_ep_names()[0]
500+
ep_devices = ort.get_ep_devices()
501+
selected_ep_device = next((ep_device for ep_device in ep_devices
502+
if ep_device.ep_name == ep_name and ep_device.device.metadata["is_virtual"] == "1"), None)
503+
assert selected_ep_device is not None, "Did not find ep device for target EP"
504+
505+
ep_options = {} # EP-specific options
506+
session_options = ort.SessionOptions()
507+
session_options.add_provider_for_devices([selected_ep_device], ep_options)
508+
509+
# Compile the model
510+
model_compiler = ort.ModelCompiler(
511+
session_options,
512+
"input_model.onnx",
513+
# ... other options ...
514+
)
515+
model_compiler.compile_to_file("output_model.onnx")
516+
517+
# Unregister the library using the same registration name specified earlier.
518+
# Must only unregister a library after all `ModelCompiler` objects that use the library have been released.
519+
del model_compiler
520+
ort.unregister_execution_provider_library(ep_lib_registration_name)
521+
```
522+
523+
#### Plugin EP library code
524+
A plugin EP library determines if the creation of virtual devices is allowed by checking if the "allow_virtual_devices" environment configuration entry
525+
is set to "1". The following snippet from a [reference EP implementation](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu/ep_lib_entry.cc) shows how a plugin EP library could check environment configuration entries within the library's
526+
exported `CreateEpFactories` function.
527+
528+
```c++
529+
#include "core/session/onnxruntime_env_config_keys.h"
530+
#define ORT_API_MANUAL_INIT
531+
#include "onnxruntime_cxx_api.h"
532+
#undef ORT_API_MANUAL_INIT
533+
534+
// other includes ..
535+
536+
extern "C" {
537+
EXPORT_SYMBOL OrtStatus* CreateEpFactories(const char* /*registration_name*/, const OrtApiBase* ort_api_base,
538+
const OrtLogger* default_logger,
539+
OrtEpFactory** factories, size_t max_factories, size_t* num_factories) {
540+
EXCEPTION_TO_RETURNED_STATUS_BEGIN
541+
const OrtApi* ort_api = ort_api_base->GetApi(ORT_API_VERSION);
542+
const OrtEpApi* ep_api = ort_api->GetEpApi();
543+
const OrtModelEditorApi* model_editor_api = ort_api->GetModelEditorApi();
544+
545+
// Manual init for the C++ API
546+
Ort::InitApi(ort_api);
547+
548+
if (max_factories < 1) {
549+
return ort_api->CreateStatus(ORT_INVALID_ARGUMENT,
550+
"Not enough space to return EP factory. Need at least one.");
551+
}
552+
553+
Ort::KeyValuePairs env_configs = Ort::GetEnvConfigEntries(); // Wraps OrtEpApi::GetEnvConfigEntries()
554+
555+
// Extract a config that determines whether creating virtual hardware devices is allowed.
556+
// An application can allow an EP library to create virtual devices in two ways:
557+
// 1. Use an EP library registration name that ends in the suffix ".virtual". If so, ORT will automatically
558+
// set the config key "allow_virtual_devices" to "1" in the environment.
559+
// 2. Directly set the config key "allow_virtual_devices" to "1" when creating the
560+
// OrtEnv via OrtApi::CreateEnvWithOptions().
561+
const char* config_value = env_configs.GetValue(kOrtEnvAllowVirtualDevices);
562+
const bool allow_virtual_devices = config_value != nullptr && strcmp(config_value, "1") == 0;
563+
564+
std::unique_ptr<OrtEpFactory> factory = std::make_unique<EpFactoryVirtualGpu>(*ort_api, *ep_api, *model_editor_api,
565+
allow_virtual_devices, *default_logger);
566+
567+
factories[0] = factory.release();
568+
*num_factories = 1;
569+
570+
return nullptr;
571+
EXCEPTION_TO_RETURNED_STATUS_END
572+
}
573+
574+
// ...
575+
576+
} // extern "C"
577+
```
578+
579+
An EP factory's `OrtEpFactory::GetSupportedDevices()` function may then use `OrtEpApi::CreateHardwareDevice()` to create a virtual hardware device.
580+
581+
```c++
582+
#include "core/session/onnxruntime_ep_device_ep_metadata_keys.h"
583+
// Other includes ...
584+
585+
/*static*/
586+
OrtStatus* ORT_API_CALL EpFactoryVirtualGpu::GetSupportedDevicesImpl(OrtEpFactory* this_ptr,
587+
const OrtHardwareDevice* const* /*devices*/,
588+
size_t /*num_devices*/,
589+
OrtEpDevice** ep_devices,
590+
size_t max_ep_devices,
591+
size_t* p_num_ep_devices) noexcept {
592+
size_t& num_ep_devices = *p_num_ep_devices;
593+
auto* factory = static_cast<EpFactoryVirtualGpu*>(this_ptr);
594+
595+
num_ep_devices = 0;
596+
597+
// Create a virtual OrtHardwareDevice if application indicated it is allowed (e.g., for cross-compiling).
598+
// This example EP creates a virtual GPU OrtHardwareDevice and adds a new OrtEpDevice that uses the virtual GPU.
599+
if (factory->allow_virtual_devices_ && num_ep_devices < max_ep_devices) {
600+
// A virtual hardware device should have a metadata entry "is_virtual" set to "1".
601+
OrtKeyValuePairs* hw_metadata = nullptr;
602+
factory->ort_api_.CreateKeyValuePairs(&hw_metadata);
603+
factory->ort_api_.AddKeyValuePair(hw_metadata, kOrtHardwareDevice_MetadataKey_IsVirtual, "1");
604+
605+
auto* status = factory->ep_api_.CreateHardwareDevice(OrtHardwareDeviceType::OrtHardwareDeviceType_GPU,
606+
factory->vendor_id_,
607+
/*device_id*/ 0,
608+
factory->vendor_.c_str(),
609+
hw_metadata,
610+
&factory->virtual_hw_device_);
611+
factory->ort_api_.ReleaseKeyValuePairs(hw_metadata); // Release since ORT makes a copy.
612+
613+
if (status != nullptr) {
614+
return status;
615+
}
616+
617+
OrtKeyValuePairs* ep_metadata = nullptr;
618+
OrtKeyValuePairs* ep_options = nullptr;
619+
factory->ort_api_.CreateKeyValuePairs(&ep_metadata);
620+
factory->ort_api_.CreateKeyValuePairs(&ep_options);
621+
622+
// made up example metadata values.
623+
factory->ort_api_.AddKeyValuePair(ep_metadata, "some_metadata", "1");
624+
factory->ort_api_.AddKeyValuePair(ep_options, "compile_optimization", "O3");
625+
626+
OrtEpDevice* virtual_ep_device = nullptr;
627+
status = factory->ort_api_.GetEpApi()->CreateEpDevice(factory, factory->virtual_hw_device_, ep_metadata,
628+
ep_options, &virtual_ep_device);
629+
630+
factory->ort_api_.ReleaseKeyValuePairs(ep_metadata);
631+
factory->ort_api_.ReleaseKeyValuePairs(ep_options);
632+
633+
if (status != nullptr) {
634+
return status;
635+
}
636+
637+
ep_devices[num_ep_devices++] = virtual_ep_device;
638+
}
639+
640+
return nullptr;
641+
}
642+
```
643+
644+
#### References
645+
- [Reference example plugin EP with virtual GPU](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu)
646+
- [OrtEpApi::GetEnvConfigEntries C API function](https://github.com/microsoft/onnxruntime/blob/990ba5f0c3e0c8735fec8bf89dd11953224a9c03/include/onnxruntime/core/session/onnxruntime_ep_c_api.h#L1431-L1446)
647+
- [Ort::GetEnvConfigEntries C++ API function](https://github.com/microsoft/onnxruntime/blob/990ba5f0c3e0c8735fec8bf89dd11953224a9c03/include/onnxruntime/core/session/onnxruntime_cxx_api.h#L3531-L3532)
648+
- [Plugin EP library documentation](./plugin-ep-libraries.md)
649+
- [Additional Python usage examples in unit tests](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python_compile_api.py)
650+
- [Python ModelCompiler class](https://github.com/microsoft/onnxruntime/blob/a5ba2ba3998820dd8da111c90c420479aac7a11e/onnxruntime/python/onnxruntime_inference_collection.py#L680-L709)
651+
652+
### Usage example: EPContext weight sharing with plugin EPs
653+
The compile API also supports [EPContext resource/weight sharing](./EP-Context-Design.md#epcontext-with-weight-sharing) with plugin EPs.
654+
655+
```python
656+
import onnxruntime as ort
657+
import onnxruntime_ep_contoso_ai as contoso_ep
658+
659+
ep_lib_registration_name = "contoso_ep_lib"
660+
ort.register_execution_provider_library(ep_lib_registration_name, contoso_ep.get_library_path())
661+
662+
# The models that share resources
663+
input_models = ["input_model_0.onnx", "input_model_1.onnx"]
664+
output_models = ["output_model_0.onnx", "output_model_1.onnx"]
665+
666+
# Set the EP to use for compilation
667+
ep_devices = ort.get_ep_devices()
668+
selected_ep_device = next((ep_device for ep_device in ep_devices if ep_device.ep_name == contoso_ep.get_ep_names()[0]), None)
669+
assert selected_ep_device is not None, "Did not find ep device for target EP"
670+
671+
ep_options = {} # EP-specific options
672+
session_options = ort.SessionOptions()
673+
session_options.add_provider_for_devices([selected_ep_device], ep_options)
674+
675+
# Set option that tells EP to share resources (e.g., weights) across sessions.
676+
session_options.add_session_config_entry("ep.share_ep_contexts", "1")
677+
678+
# Compile individual models
679+
for i in range(len(input_models)):
680+
if i == num_models - 1:
681+
# Tell EP that this is the last compiling session that will be sharing resources.
682+
session_options.add_session_config_entry("ep.stop_share_ep_contexts", "1")
683+
684+
model_compiler = onnxrt.ModelCompiler(
685+
session_options,
686+
input_models[i],
687+
# ... other options ...
688+
)
689+
model_compiler.compile_to_file(output_models[i])
690+
691+
# Unregister the library using the same registration name specified earlier.
692+
# Must only unregister a library after all `ModelCompiler` objects that use the library have been released.
693+
ort.unregister_execution_provider_library(ep_lib_registration_name)
694+
```
695+
696+
#### References
697+
- [Plugin EP library documentation](./plugin-ep-libraries.md)
698+
- [Additional Python usage examples in unit tests](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python_compile_api.py)
699+
- [Python ModelCompiler class](https://github.com/microsoft/onnxruntime/blob/a5ba2ba3998820dd8da111c90c420479aac7a11e/onnxruntime/python/onnxruntime_inference_collection.py#L680-L709)

docs/execution-providers/plugin-ep-packaging.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,19 @@
1+
---
2+
title: Plugin Execution Provider Packaging Guidance
3+
description: Packaging guidance for plugin EP packages
4+
parent: Execution Providers
5+
nav_order: 18
6+
redirect_from: /docs/reference/execution-providers/Plugin-EP-Packaging
7+
---
8+
19
# ONNX Runtime Plugin Execution Provider Packaging Guidance
10+
{: .no_toc }
11+
12+
## Contents
13+
{: .no_toc }
14+
15+
* TOC placeholder
16+
{:toc}
217

318
## Overview
419

0 commit comments

Comments
 (0)