-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[CPU] Enable Weightless models cache #29304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
8582cc3
to
ea30e62
Compare
ab254a4
to
ea0e3f7
Compare
@@ -41,6 +41,10 @@ class ModelDeserializer { | |||
|
|||
void operator>>(std::shared_ptr<ov::Model>& model); | |||
|
|||
void set_weights_path(std::string& weights_path) { | |||
m_weights_path = weights_path; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::shared_ptr<char[]> new_buf(new char[actual_size]); | ||
data = new_buf.get(); | ||
weights_buf = std::make_shared<ov::SharedBuffer<std::shared_ptr<char[]>>>(data, actual_size, new_buf); | ||
convert_dt(el_type, original_dt, data, m_weights->get_ptr<char>() + offset, el_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we perform constants conversion directly in IR FE via suboptimal way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we need to get converted values during nodes creation, otherwise some nodes could not pass 'validate_and_infer_types' and graph compilation fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that constants conversion from one type to another is responsibility of IR reader.
Should original saving logic implement such conversion steps as constant subgraphs which are read as is?
Later, plugin can fold such subgraphs to get constants in desired precision.
Or at least original_precision
should be applied on plugin level with faster functions than manual conversions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @ilya-lavrenov , the de-serializer just should read xml and additional convert should not be there. The plugin should apply any conversion if required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do understand your concern, but precision forcing may lead to precision propagation. That will modify the graph that the plugin saved before and will require transformations pipeline. That makes model caching senseless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean not modify graph but use correct weights and apply only conversion only on original weight if required but not in (de)serialization part
1569b18
to
e5800b0
Compare
e5800b0
to
96d3c54
Compare
@@ -313,7 +313,8 @@ class CoreImpl : public ov::ICore, public std::enable_shared_from_this<ov::ICore | |||
bool frontend_mode = false) const override; | |||
|
|||
std::shared_ptr<ov::Model> read_model(const std::shared_ptr<AlignedBuffer>& model, | |||
const std::shared_ptr<AlignedBuffer>& weights) const override; | |||
const std::shared_ptr<AlignedBuffer>& weights, | |||
const std::shared_ptr<AlignedBuffer>& origin_weights = nullptr) const override; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this additional parameter?
It looks like CPU specific pass it as property if required.
The weights can be restored by hints (paths or pointer to original model with original weights) and then pass as weights depends which are available.
The other plugin like GPU, NPU can handle weightless model without additional parameter here. I think is should not be added and if required this weights should be restored by some property in plugin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, the GPU plugin has its own serializer/deserializer and loads the original weights internally. Loading of weights by hints in Core or Frontend parts will affect other plugins.
std::shared_ptr<char[]> new_buf(new char[actual_size]); | ||
data = new_buf.get(); | ||
weights_buf = std::make_shared<ov::SharedBuffer<std::shared_ptr<char[]>>>(data, actual_size, new_buf); | ||
convert_dt(el_type, original_dt, data, m_weights->get_ptr<char>() + offset, el_num); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @ilya-lavrenov , the de-serializer just should read xml and additional convert should not be there. The plugin should apply any conversion if required.
@@ -779,6 +780,13 @@ ov::SoPtr<ov::ICompiledModel> ov::CoreImpl::compile_model(const std::shared_ptr< | |||
cacheContent.blobId = ov::ModelCache::compute_hash(model, create_compile_config(plugin, parsed._config)); | |||
cacheContent.model = std::const_pointer_cast<ov::Model>(model); | |||
std::unique_ptr<CacheGuardEntry> lock = cacheGuard.get_hash_lock(cacheContent.blobId); | |||
|
|||
const auto& rt_info = model->get_rt_info(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #29354, the logic of this part is changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic still does not work if user didn't set weights_path
hint. cacheContent.modelPath is empty on this way and weights path couldn't be compiled.
51128bb
to
e4f1322
Compare
### Details: - Add `ov::hint::compiled_blob`, property with tensor hint which contains compiled model blob - Compiled blob hint can be regular or weightless model. - For weightless model the property `WEIGHTS_PATH` is hint where find the model's weights - If model found in cache then weight path will read from compiled options or from property `WEIGHTS_PATH` hint. - If model compile fail from blob hint the fallback path will be used (original model). ### Related PRs: - #29175 - #29304 - #29530 ### Tickets: - CVS-153070 --------- Signed-off-by: Raasz, Pawel <[email protected]> Signed-off-by: Pawel Raasz <[email protected]>
e4f1322
to
30f321f
Compare
30f321f
to
d88fc8f
Compare
### Details: - Add `ov::hint::compiled_blob`, property with tensor hint which contains compiled model blob - Compiled blob hint can be regular or weightless model. - For weightless model the property `WEIGHTS_PATH` is hint where find the model's weights - If model found in cache then weight path will read from compiled options or from property `WEIGHTS_PATH` hint. - If model compile fail from blob hint the fallback path will be used (original model). ### Related PRs: - openvinotoolkit#29175 - openvinotoolkit#29304 - openvinotoolkit#29530 ### Tickets: - CVS-153070 --------- Signed-off-by: Raasz, Pawel <[email protected]> Signed-off-by: Pawel Raasz <[email protected]>
835b731
to
67fd46c
Compare
67fd46c
to
8b9e939
Compare
8b9e939
to
4108c65
Compare
4108c65
to
02a5a95
Compare
Details:
Tickets: