WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side#21080
WIP: [CPU][ARM] Weights compression f32->f16 is moved to CPU Plug-in side#21080antonvor wants to merge 2 commits intoopenvinotoolkit:masterfrom
Conversation
c8cee60 to
d62e4d3
Compare
d62e4d3 to
2e42899
Compare
0f383e0 to
a09364c
Compare
|
@itikhono may I ask you to review transformation changes? |
src/plugins/intel_cpu/tests/functional/subgraph_tests/src/arm/matmul_compress_convert.cpp
Outdated
Show resolved
Hide resolved
|
LGTM from tests side |
3bcbe16 to
aaa63c7
Compare
| } | ||
| }; | ||
|
|
||
| class TRANSFORMATIONS_API Compression : public RuntimeAttribute { |
There was a problem hiding this comment.
Could you please provide in comment with explanation why we need this rt_info? Fron which it will be clear why we cannot use the existing ones a need a new rt_info
| Compression() = default; | ||
|
|
||
| bool visit_attributes(AttributeVisitor& visitor) override { | ||
| return true; |
There was a problem hiding this comment.
is it really necessary to store this rt_info to IR?
| copy_runtime_info(incoming_node, convert); | ||
| input.replace_source_output(convert); | ||
| disable_fp16_compression(convert); | ||
| mark_as_compression(convert); |
There was a problem hiding this comment.
This converts are decompression converts: they upcast to fp32 for precision sensitive subgraphs. Is it possible to rename mark_as_compression to avoid confusions?
Since mark_as_compression is used only for converts that are inserted to align types for f16 and f32 parts, can we name it e.g. mark_type_aligning_convert to avoid confusion?
aaa63c7 to
0e48570
Compare
0e48570 to
1b28c78
Compare
|
This PR will be closed in a week because of 2 weeks of no activity. |
|
This PR was closed because it has been stalled for 2 week with no activity. |
Details:
The PR disables weights compression fp32->fp16 on the ngraph side and moves them to the CPU plug-in in fp32 precision. It allows us to improve memory consumption on ARM64 platforms. This change only affects MatMul nodes
PR to oneDNN fork: openvinotoolkit/oneDNN#220
Tickets: