Skip to content

[NPUW] gemma-2 patterns added to preserve tail constants matcher#32465

Merged
dmatveev merged 23 commits intoopenvinotoolkit:masterfrom
esmirno:es/gemma-2-tail-constants
Mar 12, 2026
Merged

[NPUW] gemma-2 patterns added to preserve tail constants matcher#32465
dmatveev merged 23 commits intoopenvinotoolkit:masterfrom
esmirno:es/gemma-2-tail-constants

Conversation

@esmirno
Copy link
Contributor

@esmirno esmirno commented Oct 17, 2025

Details:

  • gemma2-sym works fine after weight preserved as constants. Performance matches expectations
image
  • gemma2-asym works but accuracy issues found. Performance matches expectations too
image
  • as of compiler version 7.28 no accuracy issues observed anymore - this patch behavior by default restricted to this compiler version only.

Tickets:

  • E-189635

@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Oct 17, 2025
@dmatveev dmatveev added this to the 2026.0 milestone Oct 31, 2025
@esmirno esmirno marked this pull request as ready for review December 16, 2025 15:06
@esmirno esmirno requested review from a team as code owners December 16, 2025 15:06
@github-actions
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Jan 12, 2026
@github-actions
Copy link
Contributor

This PR was closed because it has been stalled for 2 week with no activity.

@github-actions github-actions bot closed this Jan 19, 2026
@dmatveev dmatveev reopened this Jan 20, 2026
@github-actions github-actions bot removed the Stale label Jan 23, 2026
@dmatveev dmatveev modified the milestones: 2026.0, 2026.1 Jan 27, 2026
@github-actions
Copy link
Contributor

This PR will be closed in a week because of 2 weeks of no activity.

@github-actions github-actions bot added the Stale label Feb 20, 2026
@github-actions
Copy link
Contributor

This PR was closed because it has been stalled for 2 week with no activity.

@github-actions github-actions bot closed this Feb 27, 2026
@dmatveev dmatveev reopened this Mar 4, 2026
@github-actions github-actions bot removed the Stale label Mar 5, 2026
@esmirno esmirno requested review from a team as code owners March 9, 2026 22:01
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Mar 9, 2026
@github-actions github-actions bot removed the category: CPU OpenVINO CPU plugin label Mar 9, 2026
@esmirno esmirno requested review from AsyaPronina and dmatveev March 10, 2026 10:36
if (!m_use_chunk_prefill) {
// TODO: sometimes it is ok if we cannot find any empty inputs or not?
NPUW_ASSERT(remove_empty_kv_inputs(prefill_model));
remove_empty_kv_inputs(prefill_model);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is especially needed and a leftover for fp8-cb4 work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AsyaPronina indeed i found a source of the problem - in fp8 patterns we used for redirect - that one also need to be checked when empty kv inputs removed with concats etc in prefill - please have a look on actual implementation

@esmirno esmirno requested a review from AsyaPronina March 11, 2026 23:00
users.insert(users.end(), param_users.begin(), param_users.end());
}

// Remove duplicates (concat itself will appear in users)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is users a set and duplicates are removed because of that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AsyaPronina I think we are not removing duplicates anyway - just searching for shapeof user, concat of course appeared - comment looks not very relevant

auto param = opp::wrap_type<ov::op::v0::Parameter>();
auto concat = opp::wrap_type<ov::op::v0::Concat>({param, opp::any_input()});
auto param_or =
std::make_shared<opp::op::Or>(ov::OutputVector{param, match_down_up_convert_subgraph_after_lpt(param)});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch!!

@esmirno esmirno added this pull request to the merge queue Mar 12, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Mar 12, 2026
@dmatveev dmatveev added this pull request to the merge queue Mar 12, 2026
Merged via the queue into openvinotoolkit:master with commit 726fdaf Mar 12, 2026
232 of 238 checks passed
@dmatveev dmatveev deleted the es/gemma-2-tail-constants branch March 12, 2026 20:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin Code Freeze

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants