Support export_model and import_model for MULTI device mode#5
Support export_model and import_model for MULTI device mode#5andersendsa wants to merge 37 commits intomasterfrom
Conversation
Motivation: Exporting a compiled model in MULTI mode was previously not supported, returning OPENVINO_NOT_IMPLEMENTED. This prevented users from caching or serializing MULTI-device models. Changes: - Implemented `AutoCumuCompiledModel::export_model` in `src/plugins/auto/src/cumulative_compiled_model.cpp` to serialize the MULTI configuration (XML) and delegate model export to sub-devices. - Implemented `Plugin::import_model` in `src/plugins/auto/src/plugin.cpp` to deserialize the MULTI configuration and reconstruct the distributed model state by importing sub-models. - Updated `CumuSchedule::init` in `src/plugins/auto/src/cumulative_schedule.cpp` to handle initialization during import (skipping compilation tasks when `ov::Model` is null). - Added `openvino/pass/serialize.hpp` and `openvino/util/xml_parse_utils.hpp` includes where necessary. - Added a functional test `CanExportImportMultiModel` in `src/plugins/auto/tests/functional/behavior/multi_export_import_test.cpp`. Verification: - Compiled `openvino_auto_plugin` and `ov_auto_func_tests`. - Ran `AutoFuncTests.CanExportImportMultiModel` test, which passed successfully. - Verified that the exported model can be imported and used for inference with correct results and device utilization. Co-authored-by: andersendsa <199610634+andersendsa@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
…able Affine Parameters (openvinotoolkit#33861) ### Details: - This PR enhances RMS normalization fusion to support pattern without learnable affine parameter (gamma), enabling optimization of transformer architecture like LTX-Video - The existing RMS fusion pass only supported pattern with constant gamma parameter. However, some transformer model (e.g., LTX-Video's attention layers) use RMS normalization followed by dynamic scaling operation where the scale factor is non-constant. These pattern was previously unfused, missing optimization opportunity - When `elementwise_affine=False` (equivalent to [Pytorch RMS's attribute](https://docs.pytorch.org/docs/stable/generated/torch.nn.modules.normalization.RMSNorm.html)), RMS normalization does not include learnable gamma parameters. The gamma is implicitly fixed to ones, reducing the decomposed graph pattern from: `x → Power(2) → ReduceMean → Add(eps) → Sqrt → Divide(1/√) → Multiply(x, 1/√) → Multiply(gamma) ` to: `x → Power(2) → ReduceMean → Add(eps) → Sqrt → Divide(1/√) → Multiply(x, 1/√) [NO gamma multiplication] ` <img width="648" height="924" alt="image-2026-01-26-22-53-05-973" src="https://github.com/user-attachments/assets/02b4580f-bbce-43ea-affd-438f0a5f4ea7" /> ### Tickets: - [CVS-179953](https://jira.devtools.intel.com/browse/CVS-179953) --------- Signed-off-by: Andrew Park <andrew.park@intel.com>
### Details: Remove 'using namespace *' in the snippets related part of the code base (tests excluded) ### Tickets: - N/A
…otoolkit#33940) ### Description of the issue(symptom, root-cause, how it was resolved) - Fixed the use of float accumulators for intermediate calculations. - Corrected the use of float inputs for MAD operations. #### The code and line that caused this issue (if it is not changed directly) - src\plugins\intel_gpu\src\kernel_selector\cl_kernels\gemm_tiled_opt.cl #### Reproduction step and snapshot (if applicable. Do not attach for customer model) - reproducer is attached in the ticket. #### Checklist - [x] Is it a proper fix? (not a workaround) - [x] Did you include test case for this fix, if necessary? - [ ] Did you review existing test that can be extended to cover this scenario? Which test did you review? ### Tickets: - 179229
…change in pattern (openvinotoolkit#33984) ### Details: - *Due to recent change, Reshape special_zeros attribute flipped bool val, causing mismatch* - *Fix e2e test skip* ### Tickets: - *CVS-180693* - *CVS-180696* - *CVS-180665* --------- Co-authored-by: Mikhail Ryzhov <mikhail.ryzhov@intel.com>
### Tickets: - N/A
…notoolkit#34058) ### Details: Rebase the oneDNN change to the latest v3.8 HEAD ### Tickets: - N/A
### Details: - *item1* - *...* ### Tickets: - *ticket-id*
### Details:
- ACL is upgraded to 52.8.0
- Android ACL scons command has been changed to match ACL team setup:
- Switched to target-triple compiler prefix `<triple><api>-` and empty
`toolchain_prefix`
- Added Android ABI -> target triple mapping
- Added API level resolution/validation (`ANDROID_PLATFORM_LEVEL` with
fallback from `ANDROID_PLATFORM`).
### Tickets:
- CVS-180218
### Details: ITT traces were initially enabled on target platform only in openvinotoolkit#31499 This patch extends support for all available platforms ### Tickets: - N/A
…openvinotoolkit#33979) ### Tickets: - CVS-179708 --------- Signed-off-by: Tomasz Jankowski <tomasz1.jankowski@intel.com>
[About] This PR enables u8 kv cache precsion for SDPA operator and optimizes the same with NEON and SVE. - Improves the performance of OSS master [ where reference implementation is available ] version by 27%. - But we are slower by 2.7% when compared with non-quantized f16 cache precision due to additional overhead of quantization and dequantization for smaller models like TinyLlama-1.1B for single inference. - Such performance benefit [from u8 quantization] can be seen only when the inference is more memory bound. We see speedups around 3-5% when inferencing LLama-70B int8 quantized model for single Inference case. - Therefore, even though we achieve a speedup of 27% compared to reference implementation, we assume the general case to be compute bound and currently keeping the default as F16 only. - As models get larger and in multiple batch scenarios, by setting kv_cache as "u8" we see significant boost at inference level. | OSS ref impl - u8 | This PR | |----------|:----------:| | 10.8 tokens/sec | 13.7 tokens/sec | Single inference performance on LLAMA2-7B model on 32c graviton machine. The values are in TPS [ Tokens per second ]. This work is contributed by @ashwins990 & @abhijain1204fujitsu
### Details: The ReshapePRelu transformation crashes when PRelu input has rank=0 (scalar). The existing check `prelu_rank.get_length() == 1` skips rank-1 inputs but allows rank-0 scalars to pass through. The code then attempts to access dimension index 1 (channel_dim_idx), which causes "Accessing out-of-range dimension" exception. Changed the condition from `== 1` to `< 2` to skip both scalar (rank=0) and 1D (rank=1) inputs, since the transformation requires at least rank-2 tensors to access the channel dimension. ### Tickets: - 179013
…tures (openvinotoolkit#34062) ### Details: - Move shared CPU snippets shape infer registrations to `transformations/snippets/common/shape_inference.cpp` - Make x64/aarch64 shape inference files extend the common registry with arch-specific ops only - Exclude `transformations/snippets/x64/*` for all non-x64 builds (not only aarch64) in CPU plugin CMake, so RISC-V build **does not** depend on x64 snippets transformations directory ### Tickets: - N/A
Basics unit tests are skipped until CVS-180810 is fixed. --------- Signed-off-by: Kirill Suvorov <kirill.suvorov@intel.com>
### Details: - *updated [PR](openvinotoolkit#34022
### Details: - Guard InteractionNode-dependent logic in constructor and `isSupportedOperation` with `OPENVINO_ARCH_X86_64` Interaction node source is compiled on non-x64 targets, while `transformations/cpu_opset/x64/op/interaction.cpp` (which defines `ov::intel_cpu::InteractionNode`) is excluded by CMake when x64 is OFF. Issue that is being resolved here: non-x64 links could fail with: "undefined reference to typeinfo for ov::intel_cpu::InteractionNode" (e.g. in case of `ov_cpu_unit_tests` on RISCV64). ### Tickets: - N/A
…#33244) ### Details: - *Changed partial ov::parallel_for to CpuParallel::parallel_for in which can set TBB partitioner to AUTO or STATIC* ### Tickets: - *CVS-177452*
…penvinotoolkit#33963) ### Details: - Make `jit_fill_emitter` not responcible for inplace operations - Introduce `InsertTailFill` pass as a part of `ReduceDecomposition` pass to support conditional tail insertion to remove workaround on emitter side ### Tickets: - 126270
### Details: - Adding missing L0 extension headers ``` [2026-02-10T13:14:32.621Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/level-zero-ext/ze_graph_ext.h [2026-02-10T13:14:32.622Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/level-zero-ext/ze_graph_profiling_ext.h [2026-02-10T13:14:32.622Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/level-zero-ext/ze_command_queue_npu_ext.h [2026-02-10T13:14:32.622Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/level-zero-ext/ze_intel_npu_uuid.h [2026-02-10T13:14:32.622Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/level-zero-ext/ze_context_npu_ext.h [2026-02-10T13:14:32.622Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/level-zero-ext/ze_driver_npu_ext.h ``` - Adding ittnotify headers ``` [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/disable_warnings.h [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/ittnotify_config.h [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/ittnotify_static.c [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/ittnotify_static.h [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/ittnotify_types.h [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/ittptmark32.asm [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/ittptmark32.S [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/ittptmark64.asm [2026-02-10T13:14:17.185Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/ittptmark64.S [2026-02-10T13:14:17.186Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify/jitprofiling.c [2026-02-10T13:14:17.186Z] -- Up-to-date: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/advisor-annotate.h [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/AdvisorAnnotate.cs [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/fortran [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/fortran/advisor_annotate.f90 [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/fortran/posix [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/fortran/posix/ittnotify.f90 [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/fortran/win32 [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/fortran/win32/ittnotify.f90 [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify-zca.h [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/ittnotify.h [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/jitprofiling.h [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/legacy [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/legacy/ittnotify.h [2026-02-10T13:14:17.546Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/include/ittnotify/libittnotify.h ... [2026-02-10T13:14:44.069Z] -- Installing: C:\jenkins\workspace\openVINO-builder\dev_package/developer_package/lib/libittnotify.lib ``` ### Tickets: - E#164884 ### Tests: - OpenVINO-Windows10/78895/
- Updated Python API stub (.pyi) files from the latest available nightly Auto-generated by GitHub Actions
Currently, copyright header consistency is checked manually during code review (e.g. [1](openvinotoolkit#32393 (comment)), [2](openvinotoolkit#33072 (comment)), [3](openvinotoolkit#32983 (comment)), [4](openvinotoolkit#31191 (comment))). The idea of this PR is to automate this process and save reviewers' time. ### Details: - *Introduced GHA workflow that validates copyright headers in C++ and Python files on PRs. If issues are found, it generates a patch file and fails the check with clear instructions.* - *Added some changes to the files which have copyright inconsistency in order to verify the added scripts* ### Tickets: - *N\A*
openvinotoolkit#33992) ### Details: - *Remove `std::vector<unit_8> compiledNetwork`* - *Add support for `make_tensor_from_aligned_addr`, which creates `ov::Tensor` from aligned allocated memory* ### Tickets: - *CVS-180882* --------- Signed-off-by: Kang, Wenjing <wenjing.kang@intel.com>
### Details: The PR enables Convolution non-i32 bias support. Before this PR only i32 Convolution bias was supported (due to ACL limitations). - `ConvertConvolutionBias` has been introduced. This transformation detects specific quantized Convolution patterns followed by Multiply and Add and inserts a Convert to i32 between the constant bias and the Add node. - `AddTransformation` is called for non-convolution bias only on ARM. Convolution bias is handled by `ConvertConvolutionBias transformation` on ARM. - The order of applying scales and shifts has been changed on ARM: `bias, scale, fq` on ARM vs `scale, bias, fq` on x86. It's needed to get specific postops order. - `ConvertConvolutionBias` transformation tests have been added to test the transformation. ### Tickets: - CVS-180491 --------- Co-authored-by: Vladislav Golubev <vladislav.golubev@intel.com>
### Details:
- The ITT allocate memory from the below call stack.
```
000001d4`82618d30 00007ffd`12845ff9
vfbasics!AVrfpInitializeCriticalSectionCommon+0x13d
000001d4`82618d38 00007ffc`d277ac4c
openvino!__itt_get_collection_state+0x2c
[C:\Jenkins\workspace\private-ci\ie\build-windows-vs2022@2\b\repos\openvino\thirdparty\ittapi\ittapi\src\ittnotify\ittnotify_static.c
@ 1665]
000001d4`82618d40 00007ffc`d1d68949
openvino!openvino::itt::internal::`dynamic initializer for 'state''+0x9
[src\common\itt\src\itt.cpp @ 22]
000001d4`82618d48 00007ffd`2f8de716 ucrtbase!initterm+0x36
000001d4`82618d50 00007ffc`d2782eea
openvino!dllmain_crt_process_attach+0x9a
[D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp @
66]
000001d4`82618d58 00007ffc`d2783057 openvino!dllmain_dispatch+0x6f
[D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\dll_dllmain.cpp @
276]
000001d4`82618d60 00007ffd`13d20ec4
verifier!AVrfpStandardDllEntryPointRoutine+0xf4
000001d4`82618d68 00007ffd`20dfb704
vrfcore!VfCoreStandardDllEntryPointRoutine+0x184
000001d4`82618d70 00007ffd`12848694
vfbasics!AVrfpStandardDllEntryPointRoutine+0xf4
000001d4`82618d78 00007ffd`322df86e
ntdll!LdrpCallInitRoutineInternal+0x22
000001d4`82618d80 00007ffd`3218bcae ntdll!LdrpCallInitRoutine+0x10e
000001d4`82618d88 00007ffd`321897ac ntdll!LdrpInitializeNode+0x19c
000001d4`82618d90 00007ffd`322176ea
ntdll!LdrpInitializeGraphRecurse+0x6a
000001d4`82618d98 00007ffd`32217716
ntdll!LdrpInitializeGraphRecurse+0x96
```
- Need to call `__itt_release_resources` when unload `openvino.dll`.
- The solutions
Here is the solution: we create a class used to store all resource deallocation methods, then create a static object. The release method will register to the static object; this object will be released when the dll unload, all release functions will be called in the destructor. In this way, we didn't need to change any code in DLLMain/unload_library. Just use a MACRO to define the function pointer, like the code below.
```
static void shutdown_frontend_resources() {
google::protobuf::ShutdownProtobufLibrary();
}
OV_REGISTER_SHUTDOWN_CALLBACK(shutdown_frontend_resources)
```
### Tickets:
- [CVS-179009](https://jira.devtools.intel.com/browse/CVS-179009)
- [CVS-180657](https://jira.devtools.intel.com/browse/CVS-180657)
---------
Co-authored-by: Michal Lukaszewski <michal.lukaszewski@intel.com>
…kit#34047) ### Details: Switching Debian 10 ARM64 CPU Functional Tests to newer generation of ARM64 runner - it's faster and cheaper. Leaving Linux ARM64 and cross-compilation on the old one - tests are failing, and it's not really worth to fix them, unless it's quick. Switching Python API tests in Linux ARM64 workflow to a less powerful runner - they run just as fine on it. ### Tickets: - *CVS-158878*
…rs (openvinotoolkit#34030) ### Details: - *Implement test RNG via `SeededRandom`, defaulting to FP32 outputs and consistent seed usage to stabilize test inputs.* - *Update layer tests to use RNG helpers and dtype parameters directly instead of ad‑hoc casts.* - *Remove unused imports and minor cleanup across PyTorch tests* - *Adjust string constant handling in `fx_decoder.py`.* ### Tickets: - *ticket-id*
…4117) ### Details: Remove NOLINT and template instantiations ### Tickets: - N/A
### Details: - Fix some Coverity issues ### Tickets: - *ticket-id*
### Details: - *Fix set_value attr size validation and add related testcase* ### Tickets: - *CVS-181028*
…otoolkit#34109) out_high can be read of out bounds because of incorrect stride CVS-181020
…ding (openvinotoolkit#33961) ### Details: - Changed the for-each loop over `model->get_ordered_ops()` to an index-based for loop, using `std::move` to transfer ownership of each node from the `nodes` vector. That would allow us to destroy unused constants immediately and minimize the peak memory allocation. ### Tickets: - CVS-176571 --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Katarzyna Mitrus <katarzyna.mitrus@intel.com>
Motivation: Exporting a compiled model in MULTI mode was previously not supported, returning OPENVINO_NOT_IMPLEMENTED. This prevented users from caching or serializing MULTI-device models. Changes: - Implemented `AutoCumuCompiledModel::export_model` in `src/plugins/auto/src/cumulative_compiled_model.cpp` to serialize the MULTI configuration (XML) and delegate model export to sub-devices. - Implemented `Plugin::import_model` in `src/plugins/auto/src/plugin.cpp` to deserialize the MULTI configuration and reconstruct the distributed model state by importing sub-models. - Updated `CumuSchedule::init` in `src/plugins/auto/src/cumulative_schedule.cpp` to handle initialization during import (skipping compilation tasks when `ov::Model` is null). - Added `openvino/pass/serialize.hpp` and `openvino/util/xml_parse_utils.hpp` includes where necessary. - Added a functional test `CanExportImportMultiModel` in `src/plugins/auto/tests/functional/behavior/multi_export_import_test.cpp`. - Updated copyright years to 2026. Verification: - Compiled `openvino_auto_plugin` and `ov_auto_func_tests`. - Ran `AutoFuncTests.CanExportImportMultiModel` test, which passed successfully. - Verified that the exported model can be imported and used for inference with correct results and device utilization. Co-authored-by: andersendsa <199610634+andersendsa@users.noreply.github.com>
Motivation: Enable `export_model` and `import_model` functionality for the MULTI device plugin to allow saving and loading compiled models for Cumulative Throughput mode. Also fix a critical CI failure in `Smart_CI` action. Changes: - Implemented `AutoCumuCompiledModel::export_model` to serialize device configurations to XML and sub-device binaries to the stream. - Implemented `Plugin::import_model` to parse the XML and reconstruct the distributed compiled model. - Updated `CumuSchedule` initialization to support loading without an initial `ov::Model`. - Added functional test `CanExportImportMultiModel` covering export/import flow. - Fixed race condition in `export_model` by capturing device state under lock. - Removed faulty `StreamSerialize` fallback. - Updated `.github/actions/smart-ci/action.yml` to use isolated paths for internal checkouts, preventing workspace cleanup issues. Verification: - Added `AutoFuncTests.CanExportImportMultiModel` passes. - Verified fix for race condition and XML formatting. - CI fix addresses `requirements.txt` not found error. Co-authored-by: andersendsa <199610634+andersendsa@users.noreply.github.com>
Support export_model and import_model for MULTI device mode
Motivation:
Exporting a compiled model in MULTI mode was previously not supported, returning OPENVINO_NOT_IMPLEMENTED. This prevented users from caching or serializing MULTI-device models.
Changes:
AutoCumuCompiledModel::export_modelinsrc/plugins/auto/src/cumulative_compiled_model.cppto serialize the MULTI configuration (XML) and delegate model export to sub-devices.Plugin::import_modelinsrc/plugins/auto/src/plugin.cppto deserialize the MULTI configuration and reconstruct the distributed model state by importing sub-models.CumuSchedule::initinsrc/plugins/auto/src/cumulative_schedule.cppto handle initialization during import (skipping compilation tasks whenov::Modelis null).openvino/pass/serialize.hppandopenvino/util/xml_parse_utils.hppincludes where necessary.CanExportImportMultiModelinsrc/plugins/auto/tests/functional/behavior/multi_export_import_test.cpp.Verification:
openvino_auto_pluginandov_auto_func_tests.AutoFuncTests.CanExportImportMultiModeltest, which passed successfully.PR created automatically by Jules for task 6287962269027407743 started by @andersendsa