Skip to content

[WebNN] Support more features for GQA#27234

Merged
fdwr merged 3 commits intomicrosoft:mainfrom
Honry:support-gqa-with-roe
Feb 7, 2026
Merged

[WebNN] Support more features for GQA#27234
fdwr merged 3 commits intomicrosoft:mainfrom
Honry:support-gqa-with-roe

Conversation

@Honry
Copy link
Contributor

@Honry Honry commented Feb 4, 2026

Add support for GroupQueryAttention with:

  • do_rotary=true (cos_cache/sin_cache inputs)
  • Packed QKV (optional key/value inputs)
  • Optional past_key/past_value for prefill mode
  • Remove fp16->fp32 casting workaround

Add ApplyRotaryEmbedding helper function.

Fix decode stage by using qkv_sequence_length to distinguish prefill vs decode, and use runtime seqlens_k instead of static past_sequence_length for rotary position calculation.

Add support for GroupQueryAttention with:
- do_rotary=true (cos_cache/sin_cache inputs)
- Packed QKV (optional key/value inputs)
- Optional past_key/past_value for prefill mode
- Remove fp16->fp32 casting workaround

Add ApplyRotaryEmbedding helper function.

Fix decode stage by using qkv_sequence_length instead of has_past_key
to distinguish prefill vs decode, and use runtime seqlens_k instead of
static past_sequence_length for rotary position calculation.
@Honry
Copy link
Contributor Author

Honry commented Feb 4, 2026

@fdwr, @guschmue, PTAL, thanks!

fdwr
fdwr previously approved these changes Feb 4, 2026
Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, else LGTM.

@guschmue guschmue added the ep:WebNN WebNN execution provider label Feb 5, 2026
guschmue
guschmue previously approved these changes Feb 5, 2026
@guschmue
Copy link
Contributor

guschmue commented Feb 5, 2026

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@guschmue guschmue enabled auto-merge (squash) February 5, 2026 18:15
@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@guschmue
Copy link
Contributor

guschmue commented Feb 5, 2026

run 'lintrunner -a' to make the CI happy

fdwr
fdwr previously approved these changes Feb 6, 2026
Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fdwr
Copy link
Contributor

fdwr commented Feb 6, 2026

Hmm, linter issues. I can't tell what it's complaining about though (why can't this linter be clearer? 🤨):

-    emscripten::val input,// Shape: [batch_size, sequence_length, num_heads, head_size]
-    emscripten::val cos_cache,// Shape: [max_sequence_length, head_size / 2]
-    emscripten::val sin_cache,// Shape: [max_sequence_length, head_size / 2]
-    emscripten::val position_ids,// Shape: [batch_size, sequence_length] or [1]
+    emscripten::val input,// Shape: [batch_size, sequence_length, num_heads, head_size]
+    emscripten::val cos_cache,// Shape: [max_sequence_length, head_size / 2]
+    emscripten::val sin_cache,// Shape: [max_sequence_length, head_size / 2]
+    emscripten::val position_ids,// Shape: [batch_size, sequence_length] or [1]

auto-merge was automatically disabled February 6, 2026 01:29

Head branch was pushed to by a user without write access

@Honry Honry dismissed stale reviews from fdwr and guschmue via 55562e5 February 6, 2026 01:29
@Honry
Copy link
Contributor Author

Honry commented Feb 6, 2026

Thanks much @fdwr, @guschmue, lint error fixed, please help retrigger the CI. Thanks!

Copy link
Contributor

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@fdwr
Copy link
Contributor

fdwr commented Feb 7, 2026

/azp run ONNX Runtime Web CI Pipeline,Windows GPU CI Pipeline,Linux Android Emulator QNN CI Pipeline,Windows GPU WebGPU CI Pipeline,Windows OpenVINO CI Pipeline

@fdwr
Copy link
Contributor

fdwr commented Feb 7, 2026

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,MacOS CI Pipeline,Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline

@fdwr
Copy link
Contributor

fdwr commented Feb 7, 2026

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI

@fdwr
Copy link
Contributor

fdwr commented Feb 7, 2026

/azp run Windows GPU TensorRT CI Pipeline,onnxruntime-binary-size-checks-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,Windows x64 QNN CI Pipeline,Big Models

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fdwr
Copy link
Contributor

fdwr commented Feb 7, 2026

/azp run Test Linux CUDA x64 Release,Test Linux TensorRT x64 Release,web_Debug / build_onnxruntime_web,web_Release / build_onnxruntime_web

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fdwr
Copy link
Contributor

fdwr commented Feb 7, 2026

/azp run Linux QNN CI Pipeline

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

1 similar comment
@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@fdwr fdwr merged commit 83d11b5 into microsoft:main Feb 7, 2026
94 of 165 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ep:WebNN WebNN execution provider

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants