[CPU] Enable Weightless models cache #29304

nshchego · 2025-03-06T04:09:47Z

Details:

CPU plugin. Minimizing the size of cached blob by reusing weights from the original bin file.
Some API was extended to pass original weights
IR serializer and deserializer were modified to handle both weights sources due to the CPU plugin uses them to write/read cache file.

Tickets:

161826

ilya-lavrenov · 2025-03-06T06:39:16Z

src/plugins/intel_cpu/src/utils/serialize.hpp

@@ -41,6 +41,10 @@ class ModelDeserializer {

    void operator>>(std::shared_ptr<ov::Model>& model);

+    void set_weights_path(std::string& weights_path) {
+        m_weights_path = weights_path;


please, note that case when model is compiled as compile_model(ov::Model) should also be supported (see PR #29107, NPU & GPU work is in progress) via hint::model_ptr

src/frontends/ir/src/ir_deserializer.cpp

ilya-lavrenov · 2025-03-06T06:43:15Z

src/frontends/ir/src/ir_deserializer.cpp

+                                std::shared_ptr<char[]> new_buf(new char[actual_size]);
+                                data = new_buf.get();
+                                weights_buf = std::make_shared<ov::SharedBuffer<std::shared_ptr<char[]>>>(data, actual_size, new_buf);
+                                convert_dt(el_type, original_dt, data, m_weights->get_ptr<char>() + offset, el_num);


do we perform constants conversion directly in IR FE via suboptimal way?

Yes, we need to get converted values during nodes creation, otherwise some nodes could not pass 'validate_and_infer_types' and graph compilation fails.

I don't think that constants conversion from one type to another is responsibility of IR reader.
Should original saving logic implement such conversion steps as constant subgraphs which are read as is?

Later, plugin can fold such subgraphs to get constants in desired precision.

Or at least original_precision should be applied on plugin level with faster functions than manual conversions.

Agree with @ilya-lavrenov , the de-serializer just should read xml and additional convert should not be there. The plugin should apply any conversion if required.

I do understand your concern, but precision forcing may lead to precision propagation. That will modify the graph that the plugin saved before and will require transformations pipeline. That makes model caching senseless.

I mean not modify graph but use correct weights and apply only conversion only on original weight if required but not in (de)serialization part

src/plugins/intel_cpu/src/plugin.cpp

src/inference/src/dev/core_impl.cpp

src/frontends/ir/src/ir_deserializer.cpp

praasz · 2025-03-14T10:08:32Z

src/inference/src/dev/core_impl.hpp

@@ -313,7 +313,8 @@ class CoreImpl : public ov::ICore, public std::enable_shared_from_this<ov::ICore
                                          bool frontend_mode = false) const override;

    std::shared_ptr<ov::Model> read_model(const std::shared_ptr<AlignedBuffer>& model,
-                                          const std::shared_ptr<AlignedBuffer>& weights) const override;
+                                          const std::shared_ptr<AlignedBuffer>& weights,
+                                          const std::shared_ptr<AlignedBuffer>& origin_weights = nullptr) const override;


Why this additional parameter?
It looks like CPU specific pass it as property if required.
The weights can be restored by hints (paths or pointer to original model with original weights) and then pass as weights depends which are available.

The other plugin like GPU, NPU can handle weightless model without additional parameter here. I think is should not be added and if required this weights should be restored by some property in plugin.

In particular, the GPU plugin has its own serializer/deserializer and loads the original weights internally. Loading of weights by hints in Core or Frontend parts will affect other plugins.

praasz · 2025-03-17T11:01:48Z

src/frontends/ir/src/ir_deserializer.cpp

+                                std::shared_ptr<char[]> new_buf(new char[actual_size]);
+                                data = new_buf.get();
+                                weights_buf = std::make_shared<ov::SharedBuffer<std::shared_ptr<char[]>>>(data, actual_size, new_buf);
+                                convert_dt(el_type, original_dt, data, m_weights->get_ptr<char>() + offset, el_num);


Agree with @ilya-lavrenov , the de-serializer just should read xml and additional convert should not be there. The plugin should apply any conversion if required.

src/core/src/pass/serialize.cpp

praasz · 2025-03-17T11:43:54Z

src/inference/src/dev/core_impl.cpp

@@ -779,6 +780,13 @@ ov::SoPtr<ov::ICompiledModel> ov::CoreImpl::compile_model(const std::shared_ptr<
        cacheContent.blobId = ov::ModelCache::compute_hash(model, create_compile_config(plugin, parsed._config));
        cacheContent.model = std::const_pointer_cast<ov::Model>(model);
        std::unique_ptr<CacheGuardEntry> lock = cacheGuard.get_hash_lock(cacheContent.blobId);
+
+        const auto& rt_info = model->get_rt_info();


See #29354, the logic of this part is changed.

This logic still does not work if user didn't set weights_path hint. cacheContent.modelPath is empty on this way and weights path couldn't be compiled.

### Details: - Add `ov::hint::compiled_blob`, property with tensor hint which contains compiled model blob - Compiled blob hint can be regular or weightless model. - For weightless model the property `WEIGHTS_PATH` is hint where find the model's weights - If model found in cache then weight path will read from compiled options or from property `WEIGHTS_PATH` hint. - If model compile fail from blob hint the fallback path will be used (original model). ### Related PRs: - #29175 - #29304 - #29530 ### Tickets: - CVS-153070 --------- Signed-off-by: Raasz, Pawel <[email protected]> Signed-off-by: Pawel Raasz <[email protected]>

### Details: - Add `ov::hint::compiled_blob`, property with tensor hint which contains compiled model blob - Compiled blob hint can be regular or weightless model. - For weightless model the property `WEIGHTS_PATH` is hint where find the model's weights - If model found in cache then weight path will read from compiled options or from property `WEIGHTS_PATH` hint. - If model compile fail from blob hint the fallback path will be used (original model). ### Related PRs: - openvinotoolkit#29175 - openvinotoolkit#29304 - openvinotoolkit#29530 ### Tickets: - CVS-153070 --------- Signed-off-by: Raasz, Pawel <[email protected]> Signed-off-by: Pawel Raasz <[email protected]>

nshchego force-pushed the cpu/weightless_cache branch from 8582cc3 to ea30e62 Compare March 6, 2025 04:18

github-actions bot removed the category: samples OpenVINO Runtime Samples label Mar 6, 2025

nshchego force-pushed the cpu/weightless_cache branch 3 times, most recently from ab254a4 to ea0e3f7 Compare March 6, 2025 04:40

ilya-lavrenov reviewed Mar 6, 2025

View reviewed changes

nshchego force-pushed the cpu/weightless_cache branch 3 times, most recently from 1569b18 to e5800b0 Compare March 12, 2025 13:39

nshchego marked this pull request as ready for review March 13, 2025 09:44

nshchego requested review from a team as code owners March 13, 2025 09:44

nshchego requested review from itikhono and removed request for a team March 13, 2025 09:44

praasz self-assigned this Mar 13, 2025

t-jankowski reviewed Mar 13, 2025

View reviewed changes

src/inference/src/dev/core_impl.cpp Outdated Show resolved Hide resolved

src/frontends/ir/src/ir_deserializer.cpp Outdated Show resolved Hide resolved

nshchego force-pushed the cpu/weightless_cache branch from e5800b0 to 96d3c54 Compare March 17, 2025 08:21

nshchego requested a review from a team as a code owner March 17, 2025 08:21

praasz reviewed Mar 17, 2025

View reviewed changes

praasz mentioned this pull request Mar 18, 2025

[core] Add hint::compiled_blob property #29354

Merged

nshchego force-pushed the cpu/weightless_cache branch 2 times, most recently from 51128bb to e4f1322 Compare March 19, 2025 18:24

nshchego requested review from a team as code owners March 19, 2025 18:24

nshchego requested review from ilya-lavrenov and removed request for a team March 19, 2025 18:24

github-actions bot added the category: samples OpenVINO Runtime Samples label Mar 19, 2025

nshchego force-pushed the cpu/weightless_cache branch from e4f1322 to 30f321f Compare March 21, 2025 09:12

nshchego force-pushed the cpu/weightless_cache branch from 30f321f to d88fc8f Compare April 3, 2025 10:02

nshchego force-pushed the cpu/weightless_cache branch 8 times, most recently from 835b731 to 67fd46c Compare April 21, 2025 15:20

nshchego added 4 commits April 23, 2025 00:48

[CPU] Enable Weightless models cache

fbc906b

Fixes as per comments

bd06372

Use optimized convert from the Reference lib

2c6052a

Eliminate weightless attr

ebdb607

nshchego force-pushed the cpu/weightless_cache branch from 67fd46c to 8b9e939 Compare April 22, 2025 20:48

github-actions bot added the category: CPP API OpenVINO CPP API bindings label Apr 22, 2025

nshchego force-pushed the cpu/weightless_cache branch from 8b9e939 to 4108c65 Compare April 23, 2025 07:41

Fix for Serializer

02a5a95

nshchego force-pushed the cpu/weightless_cache branch from 4108c65 to 02a5a95 Compare April 23, 2025 08:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Enable Weightless models cache #29304

[CPU] Enable Weightless models cache #29304

nshchego commented Mar 6, 2025 •

edited

Loading

ilya-lavrenov Mar 6, 2025

ilya-lavrenov Mar 6, 2025

nshchego Mar 13, 2025

ilya-lavrenov Mar 13, 2025 •

edited

Loading

praasz Mar 17, 2025

nshchego Mar 19, 2025

praasz Mar 19, 2025

praasz Mar 14, 2025

nshchego Mar 21, 2025

praasz Mar 17, 2025

praasz Mar 17, 2025

nshchego Apr 3, 2025

[CPU] Enable Weightless models cache #29304

Are you sure you want to change the base?

[CPU] Enable Weightless models cache #29304

Conversation

nshchego commented Mar 6, 2025 • edited Loading

Details:

Tickets:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilya-lavrenov Mar 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nshchego commented Mar 6, 2025 •

edited

Loading

ilya-lavrenov Mar 13, 2025 •

edited

Loading