Describe the feature request
Motivation & Real-World Failure
The WebNN API implementation in Chromium uses ONNX Runtime as a backend. In this architecture, model compilation happens in a sandboxed compiler process using embed mode (embed_mode=1), and the compiled EP context is serialized and sent back to the GPU process via IPC— embedding the compiled blob directly into the ONNX model's ep_cache_context string attribute. This is the preferred approach because the sandboxed process has restricted file system access, making external file-based EP Context impractical.
When compiling large models (e.g., Stable Diffusion Turbo UNet with ~1.6GB weights), the EP-generated compiled blob can reach ~3.3GB, which exceeds the 2GB limit imposed by int and causes the export to fail.
Failure Scenario
When an EP creates an EPContext node with a large ep_cache_context string attribute, it must convert the size_t length to int to pass to Ort::OpAttr(name, data, len, ORT_OP_ATTR_STRING). For data exceeding ~2GB, this conversion either:
- Throws at runtime if using a narrowing check (e.g.,
gsl::narrow<int>), or
- Silently overflows and corrupts data if using
static_cast<int>().
This is a framework-level limitation in ORT, not specific to any single EP. Any EP using embed mode with large compiled artifacts will hit the same wall.
Problem
OrtApi::CreateOpAttr uses int for the len parameter:
ORT_API2_STATUS(CreateOpAttr,
_In_ const char* name,
_In_ const void* data,
_In_ int len, // max ~2GB for ORT_OP_ATTR_STRING
_In_ OrtOpAttrType type,
_Outptr_ OrtOpAttr** op_attr);
For ORT_OP_ATTR_STRING, len represents the byte count of the string data. A 32-bit signed int caps this at 2^31 - 1 ≈ 2.1GB, which is insufficient for modern large model compiled artifacts.
All callers are forced to use gsl::narrow<int>() or static_cast<int>(), both of which fail or silently overflow for data larger than 2GB.
Proposed Solution
Change the len parameter from int to size_t across the full API chain:
| Layer |
File |
Change |
| C API declaration |
include/onnxruntime/core/session/onnxruntime_c_api.h |
int len → size_t len |
| Internal declaration |
onnxruntime/core/session/ort_apis.h |
int len → size_t len |
| Implementation |
onnxruntime/core/session/standalone_op_invoker.cc |
2 function signatures + loop variables |
| C++ wrapper declaration |
include/onnxruntime/core/session/onnxruntime_cxx_api.h |
Ort::OpAttr constructor |
| C++ wrapper implementation |
include/onnxruntime/core/session/onnxruntime_cxx_inline.h |
Ort::OpAttr constructor |
| Minimal build stub |
standalone_op_invoker.cc |
Stub signature |
| Callers |
Test code, EP implementations |
Remove static_cast<int> / gsl::narrow<int> |
Breaking Change Considerations
This is an ABI breaking change to the OrtApi struct. The function pointer signature changes from int to size_t, which differs in width on 64-bit platforms (4 bytes vs 8 bytes). This requires:
- Bumping
ORT_API_VERSION
- Documenting the change in release notes
- Existing plugins compiled against the old API will need to be recompiled
Alternatives Considered
-
Add CreateOpAttr2 with size_t — Avoids ABI break by adding a new API alongside the existing one. The old CreateOpAttr would delegate to the new one internally. Downside: adds API surface clutter.
-
Use int64_t instead of size_t — Fixed-width type, consistent across platforms (avoids 32-bit size_t on 32-bit builds). Some ORT APIs already use int64_t for sizes (e.g., tensor dimensions).
Affected Callers (known)
- Any plugin EP using embed mode (
embed_mode=1) with ORT_OP_ATTR_STRING for large compiled data (e.g., ep_cache_context)
onnxruntime/test/autoep/library/example_plugin_ep/ep.cc — static_cast<int>
onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu/ep.cc — static_cast<int>
onnxruntime/test/shared_lib/custom_op_utils.cc — static_cast<int>
@adrastogi @huningxin @fdwr @ibelem
Describe the feature request
Motivation & Real-World Failure
The WebNN API implementation in Chromium uses ONNX Runtime as a backend. In this architecture, model compilation happens in a sandboxed compiler process using embed mode (
embed_mode=1), and the compiled EP context is serialized and sent back to the GPU process via IPC— embedding the compiled blob directly into the ONNX model'sep_cache_contextstring attribute. This is the preferred approach because the sandboxed process has restricted file system access, making external file-based EP Context impractical.When compiling large models (e.g., Stable Diffusion Turbo UNet with ~1.6GB weights), the EP-generated compiled blob can reach ~3.3GB, which exceeds the 2GB limit imposed by
intand causes the export to fail.Failure Scenario
When an EP creates an EPContext node with a large
ep_cache_contextstring attribute, it must convert thesize_tlength tointto pass toOrt::OpAttr(name, data, len, ORT_OP_ATTR_STRING). For data exceeding ~2GB, this conversion either:gsl::narrow<int>), orstatic_cast<int>().This is a framework-level limitation in ORT, not specific to any single EP. Any EP using embed mode with large compiled artifacts will hit the same wall.
Problem
OrtApi::CreateOpAttrusesintfor thelenparameter:For
ORT_OP_ATTR_STRING,lenrepresents the byte count of the string data. A 32-bit signedintcaps this at 2^31 - 1 ≈ 2.1GB, which is insufficient for modern large model compiled artifacts.All callers are forced to use
gsl::narrow<int>()orstatic_cast<int>(), both of which fail or silently overflow for data larger than 2GB.Proposed Solution
Change the
lenparameter frominttosize_tacross the full API chain:include/onnxruntime/core/session/onnxruntime_c_api.hint len→size_t lenonnxruntime/core/session/ort_apis.hint len→size_t lenonnxruntime/core/session/standalone_op_invoker.ccinclude/onnxruntime/core/session/onnxruntime_cxx_api.hOrt::OpAttrconstructorinclude/onnxruntime/core/session/onnxruntime_cxx_inline.hOrt::OpAttrconstructorstandalone_op_invoker.ccstatic_cast<int>/gsl::narrow<int>Breaking Change Considerations
This is an ABI breaking change to the
OrtApistruct. The function pointer signature changes frominttosize_t, which differs in width on 64-bit platforms (4 bytes vs 8 bytes). This requires:ORT_API_VERSIONAlternatives Considered
Add
CreateOpAttr2withsize_t— Avoids ABI break by adding a new API alongside the existing one. The oldCreateOpAttrwould delegate to the new one internally. Downside: adds API surface clutter.Use
int64_tinstead ofsize_t— Fixed-width type, consistent across platforms (avoids 32-bitsize_ton 32-bit builds). Some ORT APIs already useint64_tfor sizes (e.g., tensor dimensions).Affected Callers (known)
embed_mode=1) withORT_OP_ATTR_STRINGfor large compiled data (e.g.,ep_cache_context)onnxruntime/test/autoep/library/example_plugin_ep/ep.cc—static_cast<int>onnxruntime/test/autoep/library/example_plugin_ep_virt_gpu/ep.cc—static_cast<int>onnxruntime/test/shared_lib/custom_op_utils.cc—static_cast<int>@adrastogi @huningxin @fdwr @ibelem