Access Violation in onnxruntime_perf_test.exe due to inconsistent seqlens_k tensor random values

### Describe the issue

onnxruntime_perf_test.exe crashes (with AV exception, code 0xC0000005) on inference due to unbound memory access:
```
.\onnxruntime_perf_test.exe -e qnn -i "backend_path|.\QnnHtp.dll soc_model|60 htp_arch|73 htp_graph_finalization_optimization_mode|3" -C "ep.share_ep_contexts|1" -m times -r 1 -I phi_context_qdq_ctx.onnx -s
onnxruntime cpuid_info warning: Unknown CPU vendor. cpuinfo_vendor value: 0
2026-01-02 10:59:05.7458171 [W:onnxruntime:, session_state.cc:1316 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2026-01-02 10:59:05.7550095 [W:onnxruntime:, session_state.cc:1318 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Exit code: -1073741819
```

The reason for crash is a negative positions/offsets in the `position_ids` input array passed to [MLAS library](https://github.com/microsoft/onnxruntime/blob/a83fc4d58cb48eb68890dd689f94f28288cf2278/onnxruntime/contrib_ops/cpu/bert/rotary_embedding.cc#L111)

The call stack screenshots from debugger:
1. <img width="1907" height="1025" alt="Image" src="https://github.com/user-attachments/assets/e0e154f1-0ff4-4e7a-b9a7-53778340c68c" />
2. <img width="1907" height="1025" alt="Image" src="https://github.com/user-attachments/assets/37b9a5e7-cdf5-40e2-ba6b-fe9921d5d473" />
3. <img width="1908" height="1026" alt="Image" src="https://github.com/user-attachments/assets/3d842b08-7266-4d08-b1c1-34bbe63d8bb0" />
4. <img width="1908" height="1026" alt="Image" src="https://github.com/user-attachments/assets/0763475d-5364-4b81-9e12-4b885061940f" />

The patch that helps:

```cpp

---
 .../cpu/bert/group_query_attention.cc         | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc b/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc
index eb1560ac8e..2c68641291 100644
--- a/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc
+++ b/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc
@@ -154,11 +154,20 @@ Status GroupQueryAttention<T>::Compute(OpKernelContext* context) const {
       for (int b = 0; b < batch_size; b++) {
         const int total_seqlen = seqlens_k->Data<int32_t>()[b] + 1;
         const int past_seqlen = total_seqlen - sequence_length;
-        for (int s = 0; s < sequence_length; s++) {
-          if (past_seqlen + s < total_seqlen) {
-            default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(past_seqlen) + s;
-          } else {
-            default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(1);
+
+        // Handle inconsistent random data in seqlens_k, when past_seqlen becomes negative
+        if (past_seqlen < 0) {
+          // Fallback: generate consecutive position IDs starting from 0
+          for (int s = 0; s < sequence_length; s++) {
+            default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(s);
+          }
+        } else {
+          for (int s = 0; s < sequence_length; s++) {
+            if (past_seqlen + s < total_seqlen) {
+              default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(past_seqlen) + s;
+            } else {
+              default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(1);
+            }
           }
         }
       }
-- 
2.52.0.windows.1
```

### To reproduce

1. Run `ep_weight_sharing_ctx_gen.exe` to compile the `phi_3_6_context_qdq.onnx` model and generate the shared weights bin file:
```
ep_weight_sharing_ctx_gen.exe -e qnn -i "backend_path|./QnnHtp.dll soc_model|60 vtcm_mb|8 htp_arch|73 htp_graph_finalization_optimization_mode|3" ./phi_context_qdq.onnx

phi_context_qdq_ctx.onnx  
phi_context_qdq_qnn.bin
```

2. Run `onnxruntime_perf_test.exe` on the resulting cache:
```
.\onnxruntime_perf_test.exe -e qnn -i "backend_path|.\QnnHtp.dll soc_model|60 htp_arch|73 htp_graph_finalization_optimization_mode|3" -C "ep.share_ep_contexts|1" -m times -r 1 -I .\phi_context_qdq_ctx.onnx -s
```

NB: the problem does not like QNN or phi specific, and potentially could happen to other models and EPs.

### Urgency

The temporaily workaround exist in a form of the patch listed above. However, the AV exception is a severe bug that must be fixed as soon as you can.

### Platform

Windows

### OS Version

Microsoft Windows 11 Enterprise, 10.0.26220, ARM64

### ONNX Runtime Installation

Built from Source

### ONNX Runtime Version or Commit ID

1.23.2, a83fc4d

### ONNX Runtime API

C++

### Architecture

ARM64

### Execution Provider

Other / Unknown

### Execution Provider Library Version

QNN 2.39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Access Violation in onnxruntime_perf_test.exe due to inconsistent seqlens_k tensor random values #27170

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Access Violation in onnxruntime_perf_test.exe due to inconsistent seqlens_k tensor random values #27170

Description

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions