Skip to content

NPUW: Switch to PYRAMID attention by default for prefill#33736

Open
dmatveev wants to merge 5 commits intoopenvinotoolkit:masterfrom
dmatveev:dm/pyramid_default
Open

NPUW: Switch to PYRAMID attention by default for prefill#33736
dmatveev wants to merge 5 commits intoopenvinotoolkit:masterfrom
dmatveev:dm/pyramid_default

Conversation

@dmatveev
Copy link
Contributor

Will only work if the chunking is enabled:

  • PREFILL_HINT is DYNAMIC (default)
  • MAX_PROMPT_LEN > CHUNK_SIZE (both 1024 by default)

Also fixed a bug: Currently we only route attention preferences to the npuw::CompiledModel when they're passed in the properties. This doesn't cover the case when e.g. PYRAMID hint gets enabled by default -- it is not present in the explicit props, so the cfg infrastructure should be used instead.

Will only work if the chunking is enabled:
- PREFILL_HINT is DYNAMIC (default)
- MAX_PROMPT_LEN > CHUNK_SIZE (both 1024 by default)
Currently we only route attention preferences to the
npuw::CompiledModel when they're passed in the properties. This
doesn't cover the case when e.g. PYRAMID hint gets enabled by
default -- it is not present in the explicit props, so the `cfg`
infrastructure should be used instead.
@dmatveev dmatveev requested review from a team as code owners January 21, 2026 13:00
@github-actions github-actions bot added category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin labels Jan 21, 2026
@dmatveev dmatveev added this to the 2026.0 milestone Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: NPU OpenVINO NPU plugin category: NPUW NPUW plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant