Add function to create usm host tensor #1900

ahnyoung-paul · 2025-03-12T09:40:53Z

After the Whisper model's commit (07dfba94a36f88f0614b8bb61ebb2d9b8ef6324c) is merged, beam_idx and encoder_hidden_states are created as ov::Tensor in the Whisper pipeline. However, this creation of ov::Tensor in GenAI may affect performance. In the GPU plugin, the tensor obtained from infer_request is a host tensor that uses usm_host memory, while the newly created tensor is an ov::allocated_tensor.

When ov::allocatedTensor is used as an input for prepare_input, it is neither a remote tensor nor a usm_host tensor. Consequently, the GPU plugin internally creates a usm_device tensor and performs unnecessary copying of values, leading to degraded performance.

To fix this issue, use the create_host_tensor method instead of creating ov::Tensor directly.

This is my issue ticket: CVS-162818

Wovchena · 2025-03-12T11:39:24Z

src/cpp/src/whisper/models/statefull_decoder.cpp

When I was doing a similar thing for VLM m_request.get_compiled_model().get_context() didn't bring any performance. But reusing the context did. My guess was that m_request.get_compiled_model().get_context() takes too much time.

Did you verify that the issue is gone with you patch?

BTW, get_context() is not available for CPU

@ahnyoung-paul Did you verify that the issue is gone with you patch?

Exception handling is known to be costly in C++. With that try catch you need to also verify that CPU performance didn't degrade

Exception handling is known to be costly in C++. With that try catch you need to also verify that CPU performance didn't degrade

Verified no found regression for CPU, please find detail from the ticket. thanks.

@ahnyoung-paul Did you verify that the issue is gone with you patch?

This issue is not caused by a single factor but by multiple issues leading to performance degradation. By applying this PR to the Whisper model's commit (07dfba9), most of the issues were resolved, but another performance degradation was found in the latest commit. This PR reduces unnecessary enqueue memory creation and copying, including enqueue memcpy, thereby reducing device time and somewhat improving latency. However, further improvements require additional analysis from the GenAI team. Detailed information has been left in the ticket (CVS-162818), thanks.

ahnyoung-paul added the category: LLM LLM pipeline (stateful, static) label Mar 12, 2025

ahnyoung-paul requested review from as-suvorov, geunhwan and yeonbok March 12, 2025 09:40

github-actions bot added category: whisper Whisper pipeline and removed category: LLM LLM pipeline (stateful, static) labels Mar 12, 2025

Wovchena reviewed Mar 12, 2025

View reviewed changes

ahnyoung-paul added 3 commits March 13, 2025 13:01

Add function to create usm host tensor

a4cfe8c

create host tensor for cache_position and add try catch

ccbb302

reuse beam_idx

4b8f74b

p-durandin added the Code Freeze label Mar 13, 2025

ilya-lavrenov assigned Wovchena Mar 13, 2025

ilya-lavrenov added this to the 2025.1 milestone Mar 13, 2025

ahnyoung-paul force-pushed the trial_whisper_tiny_perf_drop branch from 59d488e to 4b8f74b Compare March 13, 2025 09:11

Wovchena approved these changes Mar 13, 2025

View reviewed changes

ilya-lavrenov enabled auto-merge March 13, 2025 10:02

ilya-lavrenov added this pull request to the merge queue Mar 13, 2025

github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 13, 2025

ilya-lavrenov merged commit 2b94f73 into openvinotoolkit:master Mar 13, 2025
54 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add function to create usm host tensor #1900

Add function to create usm host tensor #1900

Uh oh!

ahnyoung-paul commented Mar 12, 2025 •

edited

Loading

Uh oh!

Wovchena Mar 12, 2025

Uh oh!

ilya-lavrenov Mar 12, 2025

Uh oh!

andrei-kochin Mar 13, 2025

Uh oh!

Wovchena Mar 13, 2025

Uh oh!

ahnyoung-paul Mar 13, 2025

Uh oh!

ahnyoung-paul Mar 13, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add function to create usm host tensor #1900

Add function to create usm host tensor #1900

Uh oh!

Conversation

ahnyoung-paul commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Wovchena Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

ilya-lavrenov Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

andrei-kochin Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

Wovchena Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

ahnyoung-paul Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

ahnyoung-paul Mar 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ahnyoung-paul commented Mar 12, 2025 •

edited

Loading