Skip to content

[CPU] Add ACL concat executor path#34043

Open
aobolensk wants to merge 11 commits intoopenvinotoolkit:masterfrom
aobolensk:cpu-acl-concat
Open

[CPU] Add ACL concat executor path#34043
aobolensk wants to merge 11 commits intoopenvinotoolkit:masterfrom
aobolensk:cpu-acl-concat

Conversation

@aobolensk
Copy link
Contributor

Details:

  • add generic concat executor/factory and ACL implementation for f16/f32 ncsp/nspc tensors up to 4D
  • register ACL concat descriptors, prefer ACL during PD selection, and run executor when available
  • keep impl_desc_type flexible for ref path to allow factory-provided implementations

Tickets:

  • N/A

@aobolensk aobolensk requested review from a team as code owners February 10, 2026 10:32
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Feb 10, 2026
@aobolensk aobolensk added the platform: arm OpenVINO on ARM / ARM64 label Feb 10, 2026
@aobolensk aobolensk force-pushed the cpu-acl-concat branch 10 times, most recently from 50b88bd to dada700 Compare February 11, 2026 12:47
CPU_NODE_ASSERT(selectedPd, "Preferable primitive descriptor is not set.");

auto fallbackToRefImplType = [&]() {
selectedPd->setImplementationType(impl_desc_type::ref);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?
Basically the ref executor is supposed to be created by the factory and has 'impl_desc_type::ref' impl type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

Comment on lines +599 to +600
useExecutor = selectedPd->getImplementationType() == impl_desc_type::acl && !canOptimizeNspc;
m_executor.reset();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should also be handled as generic path when selecting and executor implementation from the list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed


template <>
const std::vector<ExecutorImplementation<ConcatAttrs>>& getImplementations() {
#if defined(OV_CPU_WITH_ACL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we need to add other implementations here, not only acl ones, including the reference one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added _COMMON impls

@aobolensk aobolensk force-pushed the cpu-acl-concat branch 3 times, most recently from 474c814 to a851d7d Compare February 18, 2026 14:57
aobolensk and others added 9 commits February 19, 2026 11:08
- add generic concat executor/factory and ACL implementation for f16/f32 ncsp/nspc tensors up to 4D
- register ACL concat descriptors, prefer ACL during PD selection, and run executor when available
- keep impl_desc_type flexible for ref path to allow factory-provided implementations

namespace ov::intel_cpu {

template <>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disabling clang-format for 'getImplementations' helps to avoid wrong alignment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added clang-format off/on

supportedPrimitiveDescriptors.emplace_back(config, impl_desc_type::unknown);
}

const auto& concatImplementations = getImplementations<ConcatAttrs>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please clarify why we access implementation list here?
In general, we just need to create and use a factory, not directly the implementation list.
Also, after refactoring, 'getSupportedDescriptors()' becomes empty, and most of the logic is moved to 'initSupportedPrimitiveDescriptors' and 'createPrimitive'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin platform: arm OpenVINO on ARM / ARM64

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants