Skip to content

[CPU] Enable Weightless models cache #29304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

nshchego
Copy link
Contributor

@nshchego nshchego commented Mar 6, 2025

Details:

  • CPU plugin. Minimizing the size of cached blob by reusing weights from the original bin file.
  • Some API was extended to pass original weights
  • IR serializer and deserializer were modified to handle both weights sources due to the CPU plugin uses them to write/read cache file.

Tickets:

  • 161826

@github-actions github-actions bot added category: inference OpenVINO Runtime library - Inference category: Core OpenVINO Core (aka ngraph) category: CPU OpenVINO CPU plugin category: build OpenVINO cmake script / infra category: transformations OpenVINO Runtime library - Transformations category: samples OpenVINO Runtime Samples category: IR FE OpenVINO IR v10 / v11 FrontEnd labels Mar 6, 2025
@nshchego nshchego force-pushed the cpu/weightless_cache branch from 8582cc3 to ea30e62 Compare March 6, 2025 04:18
@github-actions github-actions bot removed the category: samples OpenVINO Runtime Samples label Mar 6, 2025
@nshchego nshchego force-pushed the cpu/weightless_cache branch 3 times, most recently from ab254a4 to ea0e3f7 Compare March 6, 2025 04:40
@@ -41,6 +41,10 @@ class ModelDeserializer {

void operator>>(std::shared_ptr<ov::Model>& model);

void set_weights_path(std::string& weights_path) {
m_weights_path = weights_path;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please, note that case when model is compiled as compile_model(ov::Model) should also be supported (see PR #29107, NPU & GPU work is in progress) via hint::model_ptr

std::shared_ptr<char[]> new_buf(new char[actual_size]);
data = new_buf.get();
weights_buf = std::make_shared<ov::SharedBuffer<std::shared_ptr<char[]>>>(data, actual_size, new_buf);
convert_dt(el_type, original_dt, data, m_weights->get_ptr<char>() + offset, el_num);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we perform constants conversion directly in IR FE via suboptimal way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to get converted values during nodes creation, otherwise some nodes could not pass 'validate_and_infer_types' and graph compilation fails.

Copy link
Contributor

@ilya-lavrenov ilya-lavrenov Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that constants conversion from one type to another is responsibility of IR reader.
Should original saving logic implement such conversion steps as constant subgraphs which are read as is?

Later, plugin can fold such subgraphs to get constants in desired precision.

Or at least original_precision should be applied on plugin level with faster functions than manual conversions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ilya-lavrenov , the de-serializer just should read xml and additional convert should not be there. The plugin should apply any conversion if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do understand your concern, but precision forcing may lead to precision propagation. That will modify the graph that the plugin saved before and will require transformations pipeline. That makes model caching senseless.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean not modify graph but use correct weights and apply only conversion only on original weight if required but not in (de)serialization part

@nshchego nshchego force-pushed the cpu/weightless_cache branch 3 times, most recently from 1569b18 to e5800b0 Compare March 12, 2025 13:39
@nshchego nshchego marked this pull request as ready for review March 13, 2025 09:44
@nshchego nshchego requested review from a team as code owners March 13, 2025 09:44
@nshchego nshchego requested review from itikhono and removed request for a team March 13, 2025 09:44
@praasz praasz self-assigned this Mar 13, 2025
@nshchego nshchego force-pushed the cpu/weightless_cache branch from e5800b0 to 96d3c54 Compare March 17, 2025 08:21
@nshchego nshchego requested a review from a team as a code owner March 17, 2025 08:21
@@ -313,7 +313,8 @@ class CoreImpl : public ov::ICore, public std::enable_shared_from_this<ov::ICore
bool frontend_mode = false) const override;

std::shared_ptr<ov::Model> read_model(const std::shared_ptr<AlignedBuffer>& model,
const std::shared_ptr<AlignedBuffer>& weights) const override;
const std::shared_ptr<AlignedBuffer>& weights,
const std::shared_ptr<AlignedBuffer>& origin_weights = nullptr) const override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this additional parameter?
It looks like CPU specific pass it as property if required.
The weights can be restored by hints (paths or pointer to original model with original weights) and then pass as weights depends which are available.

The other plugin like GPU, NPU can handle weightless model without additional parameter here. I think is should not be added and if required this weights should be restored by some property in plugin.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, the GPU plugin has its own serializer/deserializer and loads the original weights internally. Loading of weights by hints in Core or Frontend parts will affect other plugins.

std::shared_ptr<char[]> new_buf(new char[actual_size]);
data = new_buf.get();
weights_buf = std::make_shared<ov::SharedBuffer<std::shared_ptr<char[]>>>(data, actual_size, new_buf);
convert_dt(el_type, original_dt, data, m_weights->get_ptr<char>() + offset, el_num);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @ilya-lavrenov , the de-serializer just should read xml and additional convert should not be there. The plugin should apply any conversion if required.

@@ -779,6 +780,13 @@ ov::SoPtr<ov::ICompiledModel> ov::CoreImpl::compile_model(const std::shared_ptr<
cacheContent.blobId = ov::ModelCache::compute_hash(model, create_compile_config(plugin, parsed._config));
cacheContent.model = std::const_pointer_cast<ov::Model>(model);
std::unique_ptr<CacheGuardEntry> lock = cacheGuard.get_hash_lock(cacheContent.blobId);

const auto& rt_info = model->get_rt_info();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #29354, the logic of this part is changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic still does not work if user didn't set weights_path hint. cacheContent.modelPath is empty on this way and weights path couldn't be compiled.

@nshchego nshchego force-pushed the cpu/weightless_cache branch 2 times, most recently from 51128bb to e4f1322 Compare March 19, 2025 18:24
@nshchego nshchego requested review from a team as code owners March 19, 2025 18:24
@nshchego nshchego requested review from ilya-lavrenov and removed request for a team March 19, 2025 18:24
@github-actions github-actions bot added the category: samples OpenVINO Runtime Samples label Mar 19, 2025
github-merge-queue bot pushed a commit that referenced this pull request Mar 21, 2025
### Details:
- Add `ov::hint::compiled_blob`, property with tensor hint which
contains compiled model blob
 -  Compiled blob hint can be regular or weightless model.
- For weightless model the property `WEIGHTS_PATH` is hint where find
the model's weights
- If model found in cache then weight path will read from compiled
options or from property `WEIGHTS_PATH` hint.
- If model compile fail from blob hint the fallback path will be used
(original model).

### Related PRs:
- #29175
- #29304 
- #29530 

### Tickets:
 - CVS-153070

---------

Signed-off-by: Raasz, Pawel <[email protected]>
Signed-off-by: Pawel Raasz <[email protected]>
@nshchego nshchego force-pushed the cpu/weightless_cache branch from e4f1322 to 30f321f Compare March 21, 2025 09:12
@nshchego nshchego force-pushed the cpu/weightless_cache branch from 30f321f to d88fc8f Compare April 3, 2025 10:02
timxu826 pushed a commit to timxu826/openvino that referenced this pull request Apr 7, 2025
### Details:
- Add `ov::hint::compiled_blob`, property with tensor hint which
contains compiled model blob
 -  Compiled blob hint can be regular or weightless model.
- For weightless model the property `WEIGHTS_PATH` is hint where find
the model's weights
- If model found in cache then weight path will read from compiled
options or from property `WEIGHTS_PATH` hint.
- If model compile fail from blob hint the fallback path will be used
(original model).

### Related PRs:
- openvinotoolkit#29175
- openvinotoolkit#29304 
- openvinotoolkit#29530 

### Tickets:
 - CVS-153070

---------

Signed-off-by: Raasz, Pawel <[email protected]>
Signed-off-by: Pawel Raasz <[email protected]>
@nshchego nshchego force-pushed the cpu/weightless_cache branch 8 times, most recently from 835b731 to 67fd46c Compare April 21, 2025 15:20
@nshchego nshchego force-pushed the cpu/weightless_cache branch from 67fd46c to 8b9e939 Compare April 22, 2025 20:48
@github-actions github-actions bot added the category: CPP API OpenVINO CPP API bindings label Apr 22, 2025
@nshchego nshchego force-pushed the cpu/weightless_cache branch from 8b9e939 to 4108c65 Compare April 23, 2025 07:41
@nshchego nshchego force-pushed the cpu/weightless_cache branch from 4108c65 to 02a5a95 Compare April 23, 2025 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: build OpenVINO cmake script / infra category: Core OpenVINO Core (aka ngraph) category: CPP API OpenVINO CPP API bindings category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common category: inference OpenVINO Runtime library - Inference category: IR FE OpenVINO IR v10 / v11 FrontEnd category: samples OpenVINO Runtime Samples category: transformations OpenVINO Runtime library - Transformations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants