[CPU] Add ACL concat executor path#34043
[CPU] Add ACL concat executor path#34043aobolensk wants to merge 11 commits intoopenvinotoolkit:masterfrom
Conversation
50b88bd to
dada700
Compare
dada700 to
9b3b6e0
Compare
| CPU_NODE_ASSERT(selectedPd, "Preferable primitive descriptor is not set."); | ||
|
|
||
| auto fallbackToRefImplType = [&]() { | ||
| selectedPd->setImplementationType(impl_desc_type::ref); |
There was a problem hiding this comment.
why do we need this?
Basically the ref executor is supposed to be created by the factory and has 'impl_desc_type::ref' impl type
| useExecutor = selectedPd->getImplementationType() == impl_desc_type::acl && !canOptimizeNspc; | ||
| m_executor.reset(); |
There was a problem hiding this comment.
this should also be handled as generic path when selecting and executor implementation from the list
|
|
||
| template <> | ||
| const std::vector<ExecutorImplementation<ConcatAttrs>>& getImplementations() { | ||
| #if defined(OV_CPU_WITH_ACL) |
There was a problem hiding this comment.
I guess we need to add other implementations here, not only acl ones, including the reference one
There was a problem hiding this comment.
Added _COMMON impls
474c814 to
a851d7d
Compare
- add generic concat executor/factory and ACL implementation for f16/f32 ncsp/nspc tensors up to 4D - register ACL concat descriptors, prefer ACL during PD selection, and run executor when available - keep impl_desc_type flexible for ref path to allow factory-provided implementations
eb95626 to
fb9dd83
Compare
de3b45b to
d02ecb5
Compare
|
|
||
| namespace ov::intel_cpu { | ||
|
|
||
| template <> |
There was a problem hiding this comment.
disabling clang-format for 'getImplementations' helps to avoid wrong alignment
There was a problem hiding this comment.
Thanks, added clang-format off/on
| supportedPrimitiveDescriptors.emplace_back(config, impl_desc_type::unknown); | ||
| } | ||
|
|
||
| const auto& concatImplementations = getImplementations<ConcatAttrs>(); |
There was a problem hiding this comment.
Could you please clarify why we access implementation list here?
In general, we just need to create and use a factory, not directly the implementation list.
Also, after refactoring, 'getSupportedDescriptors()' becomes empty, and most of the logic is moved to 'initSupportedPrimitiveDescriptors' and 'createPrimitive'
9ae0b86 to
b05dbc0
Compare
Details:
Tickets: