Skip to content

Access Violation in onnxruntime_perf_test.exe due to inconsistent seqlens_k tensor random values #27170

@a-akoval

Description

@a-akoval

Describe the issue

onnxruntime_perf_test.exe crashes (with AV exception, code 0xC0000005) on inference due to unbound memory access:

.\onnxruntime_perf_test.exe -e qnn -i "backend_path|.\QnnHtp.dll soc_model|60 htp_arch|73 htp_graph_finalization_optimization_mode|3" -C "ep.share_ep_contexts|1" -m times -r 1 -I phi_context_qdq_ctx.onnx -s
onnxruntime cpuid_info warning: Unknown CPU vendor. cpuinfo_vendor value: 0
2026-01-02 10:59:05.7458171 [W:onnxruntime:, session_state.cc:1316 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2026-01-02 10:59:05.7550095 [W:onnxruntime:, session_state.cc:1318 onnxruntime::VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.

Exit code: -1073741819

The reason for crash is a negative positions/offsets in the position_ids input array passed to MLAS library

The call stack screenshots from debugger:

  1. Image
  2. Image
  3. Image
  4. Image

The patch that helps:

---
 .../cpu/bert/group_query_attention.cc         | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc b/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc
index eb1560ac8e..2c68641291 100644
--- a/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc
+++ b/onnxruntime/contrib_ops/cpu/bert/group_query_attention.cc
@@ -154,11 +154,20 @@ Status GroupQueryAttention<T>::Compute(OpKernelContext* context) const {
       for (int b = 0; b < batch_size; b++) {
         const int total_seqlen = seqlens_k->Data<int32_t>()[b] + 1;
         const int past_seqlen = total_seqlen - sequence_length;
-        for (int s = 0; s < sequence_length; s++) {
-          if (past_seqlen + s < total_seqlen) {
-            default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(past_seqlen) + s;
-          } else {
-            default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(1);
+
+        // Handle inconsistent random data in seqlens_k, when past_seqlen becomes negative
+        if (past_seqlen < 0) {
+          // Fallback: generate consecutive position IDs starting from 0
+          for (int s = 0; s < sequence_length; s++) {
+            default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(s);
+          }
+        } else {
+          for (int s = 0; s < sequence_length; s++) {
+            if (past_seqlen + s < total_seqlen) {
+              default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(past_seqlen) + s;
+            } else {
+              default_pos_ids[b * sequence_length + s] = static_cast<int64_t>(1);
+            }
           }
         }
       }
-- 
2.52.0.windows.1

To reproduce

  1. Run ep_weight_sharing_ctx_gen.exe to compile the phi_3_6_context_qdq.onnx model and generate the shared weights bin file:
ep_weight_sharing_ctx_gen.exe -e qnn -i "backend_path|./QnnHtp.dll soc_model|60 vtcm_mb|8 htp_arch|73 htp_graph_finalization_optimization_mode|3" ./phi_context_qdq.onnx

phi_context_qdq_ctx.onnx  
phi_context_qdq_qnn.bin
  1. Run onnxruntime_perf_test.exe on the resulting cache:
.\onnxruntime_perf_test.exe -e qnn -i "backend_path|.\QnnHtp.dll soc_model|60 htp_arch|73 htp_graph_finalization_optimization_mode|3" -C "ep.share_ep_contexts|1" -m times -r 1 -I .\phi_context_qdq_ctx.onnx -s

NB: the problem does not like QNN or phi specific, and potentially could happen to other models and EPs.

Urgency

The temporaily workaround exist in a form of the patch listed above. However, the AV exception is a severe bug that must be fixed as soon as you can.

Platform

Windows

OS Version

Microsoft Windows 11 Enterprise, 10.0.26220, ARM64

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.23.2, a83fc4d

ONNX Runtime API

C++

Architecture

ARM64

Execution Provider

Other / Unknown

Execution Provider Library Version

QNN 2.39

Metadata

Metadata

Assignees

No one assigned

    Labels

    ep:QNNissues related to QNN exeution provider

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions