Skip to content

Fix NchwcTransformer null deref on node with unresolved schema#29194

Open
tairenpiao wants to merge 2 commits into
microsoft:mainfrom
tairenpiao:fix-nchwc-null-op
Open

Fix NchwcTransformer null deref on node with unresolved schema#29194
tairenpiao wants to merge 2 commits into
microsoft:mainfrom
tairenpiao:fix-nchwc-null-op

Conversation

@tairenpiao

@tairenpiao tairenpiao commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Problem

On x86_64 with ORT_ENABLE_ALL, InferenceSession initialization crashes (SIGSEGV) inside NchwcTransformer for a model containing a Transpose → Cast(fp16→fp32) → MatMul chain on a 3D dynamic shape with a singleton dim. Stack bottoms out in graph_utils::IsSupportedOptypeVersionAndDomain.

Root cause

graph_utils::IsSupportedOptypeVersionAndDomain dereferences node.Op()->Deprecated() unconditionally. node.Op() can be nullptr when a node's schema is not resolved (e.g. a node added by an earlier transformer before the graph is re-resolved), so this is a null dereference.

Fix

Guard with node.Op() != nullptr before the deref — matching the existing pattern at graph.cc:3547. A node whose schema is unresolved cannot match a specific op, so returning false is correct. Adds a regression test that loads the reporter's minimal model at ORT_ENABLE_ALL.

Scope

The crash is x86_64-only (NCHWc enabled, MlasNchwcGetBlockSize() > 1). On other platforms the guard is a no-op and the test is a harmless smoke test.

Fixes #28392

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a crash during InferenceSession::Initialize() on x86_64 when NchwcTransformer processes a graph containing nodes whose op schema has not been resolved yet (node.Op() == nullptr). The change hardens a shared graph utility predicate used by transformers to avoid a null dereference, and adds a regression test that loads a minimal repro model.

Changes:

  • Add a null-check for node.Op() in graph_utils::IsSupportedOptypeVersionAndDomain before consulting OpSchema::Deprecated().
  • Add a regression test that loads and initializes a minimal Transpose -> Cast(fp16->fp32) -> MatMul model at Level3 optimizations.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 1 comment.

File Description
onnxruntime/test/optimizer/nchwc_optimizer_test.cc Adds a regression smoke test that loads the repro model and asserts session initialization succeeds.
onnxruntime/core/graph/graph_utils.cc Guards node.Op()->Deprecated() with node.Op() != nullptr to prevent a null dereference when schemas are unresolved.

Comment on lines 399 to 405
return (node.OpType() == op_type &&
// we don't have op schemas in the minimal build so there's no way to check the deprecated flag
#if !defined(ORT_MINIMAL_BUILD)
!node.Op()->Deprecated() &&
// node.Op() can be null for a node whose schema is not yet resolved.
node.Op() != nullptr && !node.Op()->Deprecated() &&
#endif
MatchesOpSinceVersion(node, versions) && MatchesOpSetDomain(node, domain));

@tairenpiao tairenpiao Jun 20, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled node.Op() into a local op_schema and reused it. Thanks!

@tairenpiao tairenpiao changed the title Fix NchwcTransformer null deref on node with unresolved schema (#28392) Fix NchwcTransformer null deref on node with unresolved schema Jun 21, 2026
@tairenpiao

Copy link
Copy Markdown
Contributor Author

Hi @tianleiwu. Could you please take a look?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 3 changed files in this pull request and generated no new comments.

@tianleiwu tianleiwu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The fix is correct and well-targeted. Guarding node.Op() against null in IsSupportedOptypeVersionAndDomain and treating an unresolved-schema node as "not supported" (return false) is the safe, conservative behavior, and it matches existing patterns in the codebase (e.g. insert_cast_transformer.cc checks if (!schema)). The focused regression test plus model data is a nice touch. LGTM.

Suggestion (optional, out of this PR's diff)

There is an identical unguarded pattern in a sibling optimizer at onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc:185:

#if !defined(ORT_MINIMAL_BUILD)
    !op_node.Op()->Deprecated() &&
#endif

This runs over arbitrary graph nodes (QGemm/QAttention/etc. in s8_overflow_ops) and would crash the same way on a node whose schema is unresolved (op_node.Op() == nullptr). Since this PR is specifically hardening the Op()->Deprecated() pattern against null schemas, it may be worth applying the same guard there too (here or in a quick follow-up) so the two paths stay consistent. Not blocking.

@tianleiwu tianleiwu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review

The fix is correct and complete. Guarding node.Op() against null in IsSupportedOptypeVersionAndDomain and treating an unresolved-schema node as "not supported" (returning false) is the safe, conservative behavior and matches existing patterns in the codebase. The same hardening has now been applied to the sibling avx2_weight_s8_to_u8.cc path, so both Op()->Deprecated() dereferences are protected and the two paths stay consistent — that resolves the follow-up I raised on the previous commit. Caching node.Op() into a local op_schema also addresses the earlier readability note. The focused regression test plus model data is a nice touch.

LGTM.

One optional nitpick inline regarding the test's build-config placement; non-blocking.

#ifndef DISABLE_CONTRIB_OPS

// Regression test for #28392: NchwcTransformer must not crash when a node's schema is unresolved (node.Op() == nullptr).
TEST(NchwcOptimizerTests, MatMulCastDoesNotCrashOnUnresolvedSchema) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick (non-blocking): this regression model is a plain Transpose -> Cast -> MatMul chain of standard ONNX ops, so the test does not actually depend on contrib ops. Placing it inside the #ifndef DISABLE_CONTRIB_OPS block means the regression is not guarded in contrib-ops-disabled builds. Since the crash path is in core graph utilities, consider hoisting this test outside the contrib-ops guard so it runs in more configurations. Not blocking.

@tairenpiao

Copy link
Copy Markdown
Contributor Author

@tianleiwu Seems the CI has download issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants