Skip to content

Disable CPU helper in AUTO when the model is LLM #29233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: master
Choose a base branch
from
Open
Changes from 13 commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
03810de
Disable CPU helper in AUTO when the model is LLM
wgzintel Mar 3, 2025
a6906e3
Merge branch 'master' of https://github.com/openvinotoolkit/openvino …
wgzintel Mar 3, 2025
0618583
move get_optimum_intel_version to a common API
wgzintel Mar 3, 2025
0b321b0
Merge branch 'master' of https://github.com/openvinotoolkit/openvino …
wgzintel Mar 4, 2025
f0380c4
use is_large_language_model to match LLM
wgzintel Mar 4, 2025
04212be
resolve conflict
wgzintel Mar 5, 2025
6420d0b
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 10, 2025
eac23ab
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 12, 2025
46131fd
Fix the error Not Implemented
wgzintel Mar 12, 2025
9f74701
Add comments
wgzintel Mar 12, 2025
d740d0f
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 12, 2025
6323789
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 12, 2025
ea3a8c3
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 13, 2025
7c89221
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 14, 2025
ddf2cca
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 17, 2025
62f0427
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 18, 2025
c584b08
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 19, 2025
74959c2
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 20, 2025
5441b0f
LLM model handled in filter_device_by_model when model path is empty
wgzintel Mar 20, 2025
af5e285
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Mar 21, 2025
6e928a3
optimize the code
xipingyan Mar 21, 2025
51aa318
Merge branch 'master' into guozhong/disable_cpu_helper
wgzintel Apr 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 9 additions & 3 deletions src/plugins/auto/src/plugin.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -431,11 +431,15 @@ std::shared_ptr<ov::ICompiledModel> Plugin::compile_model_impl(const std::string
bool is_cumulative =
(auto_s_context->m_performance_hint == ov::hint::PerformanceMode::CUMULATIVE_THROUGHPUT) ? true : false;
std::list<DeviceInformation> devices_with_priority(support_devices.begin(), support_devices.end());
bool is_LLM_model;
if (model_path.empty()) {
support_devices = filter_device_by_model(support_devices_by_property, model, load_config);
is_LLM_model = ov::op::util::is_large_language_model(*model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it LLM specific issue or any model which has states?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilya-lavrenov State model has been handle here

std::vector<std::string> stateful_node_names;
for (auto& op : model->get_ops()) {
if (ov::as_type_ptr<ov::op::util::AssignBase>(op) ||
ov::as_type_ptr<ov::op::util::ReadValueBase>(op)) {
stateful_node_names.push_back(op->get_friendly_name());
}
}
if (stateful_node_names.empty()) {
// not stateful model
return meta_devices;
}
// disable CPU_HELP and runtime fallback if model is stateful
disable_startup_runtime_fallback();
and it is being updated in https://github.com/openvinotoolkit/openvino/pull/27019/files#diff-85029a9232410831627d7b7b225e30d2cd4879e55b4abda5c475d04efb12daddL837-R859

Here handle the LLM model only.

@wgzintel Why this is not handled in filter_device_by_model ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated and handled in filter_device_by_model.

} else {
// AUTO / MULTI don't support caching explicitly, but can redirect this functionality to actual HW plugin
LOG_INFO_TAG("compile model with model path");
LOG_INFO_TAG("compile model with model path: %s", model_path.c_str());
auto m_model = get_core()->read_model(model_path, std::string{}, {});
is_LLM_model = ov::op::util::is_large_language_model(*m_model);
}
if (!is_cumulative) {
devices_with_priority = get_valid_device(support_devices, model_precision);
Expand All @@ -455,8 +459,10 @@ std::shared_ptr<ov::ICompiledModel> Plugin::compile_model_impl(const std::string
}
LOG_INFO_TAG("device:%s, priority:%ld", iter->device_name.c_str(), iter->device_priority);
}
auto_s_context->m_startup_fallback = load_config.get_property(ov::intel_auto::enable_startup_fallback);
auto_s_context->m_runtime_fallback = load_config.get_property(ov::intel_auto::enable_runtime_fallback);
// disable cpu helper when the model is LLM
auto_s_context->m_startup_fallback = is_LLM_model ? false : load_config.get_property(ov::intel_auto::enable_startup_fallback);
// disable runtime_fallback for only one device need to compile model when the model is LLM
auto_s_context->m_runtime_fallback = is_LLM_model ? false : load_config.get_property(ov::intel_auto::enable_runtime_fallback);
// in case of mismatching shape conflict when AUTO creates the infer requests for actual device with reshaped model
auto_s_context->m_model = model_path.empty() ? std::const_pointer_cast<ov::Model>(model) : nullptr;
auto_s_context->m_model_path = model_path;
Expand Down
Loading