Skip to content

render-test: capture shader-abort message to stdout + abort message tests#11799

Open
nv-slang-bot[bot] wants to merge 2 commits into
fix/issue-11790from
fix/issue-11790-abort-capture
Open

render-test: capture shader-abort message to stdout + abort message tests#11799
nv-slang-bot[bot] wants to merge 2 commits into
fix/issue-11790from
fix/issue-11790-abort-capture

Conversation

@nv-slang-bot

@nv-slang-bot nv-slang-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Motivation

Follow-up to #11790 / #11792. PR #11792 bumps external/slang-rhi to the head of
slang-rhi#782, which adds the Vulkan
Feature::ShaderAbort ("shader-abort") path that recovers a shader abort()'s message after the
device-fault and forwards it to the host debug callback. To actually exercise that message
round-trip from a .slang test, render-test needs to surface the captured message where a
slang-test directive can assert it.

Per maintainer guidance on #11792: "capture the handleMessage output automatically when
-render-features shader-abort is set, and let a filecheck= line assert it."
This PR implements
exactly that, plus the requested %f / multiple-argument / mixed-type abort cases.

Proposed solution

When (and only when) -render-features shader-abort is requested, render-test enters a
device-loss-tolerant capture mode: it scopes a local debug callback over the dispatch, then prints
any "Shader abort: ..." message it received to stdout. A COMPARE_COMPUTE(filecheck=...)
line FileChecks render-test's stdout, so the formatted abort text can be asserted directly. The
mode is strictly gated, so every other render test is byte-for-byte unchanged.

Change summary

File What
tools/render-test/render-test-main.cpp _isShaderAbortRequested + _printCapturedShaderAbortMessages helpers; in _innerMain, a gated branch that scopes a capture callback over app.update() and prints the abort text to stdout. Non-abort path unchanged.
tests/spirv/abort-message.slang (new) Four aborting COMPARE_COMPUTE(filecheck=...) -vk cases: plain string, %f, multi-arg, mixed %d %f. Each is its own entry point + //TEST: line (a fired abort loses the device, terminal for that invocation).
tests/spirv/abort-message.slang{,.1,.2,.3}.expected.txt (new, empty) Satisfy runComputeComparisonImpl's mandatory buffer comparison: a lost device writes no buffer, so the empty pre-cleared .actual.txt matches an empty .expected.txt.

The emit-side coverage (-target spirv -capability abortOpAbortKHR) and the device-keep-alive
buffer-compare line continue to live in tests/spirv/abort-runtime.slang (PR #11792, unchanged).

Concepts and vocabulary

  • filecheck= vs filecheck-buffer= — in a render-test COMPARE_COMPUTE, filecheck= runs
    FileCheck against render-test's stdout/stderr (_validateOutput), while filecheck-buffer=
    runs against the readback buffer file. The abort message arrives on stdout, so these tests use
    filecheck=.
  • device-fault path — on a shader abort the Vulkan device is lost; slang-rhi reads the abort
    message back via vkGetDeviceFaultDebugInfoKHR and delivers it as a host debug message prefixed
    "Shader abort: ".
  • ScopedCoreDebugCallback / CoreDebugCallback — the existing render-test debug-callback
    bridge/buffer (tools/render-test/slang-support.h); the capture mode reuses them rather than
    adding any new interface or ABI.

Process report

  • Strictly additive + gated. The capture branch is under shaderAbortMode = _isShaderAbortRequested(options); the else-branch is the identical app.update();. Zero behavior
    change for any test that does not request shader-abort. _isShaderAbortRequested reuses the
    canonical _getFeatureFromName name→feature map (one source of truth), which returns _Count
    for unknown names with no side effects, so failure timing for other tests is unchanged.
  • Device-loss tolerance. A fired abort loses the device, so app.update()'s result is swallowed
    in this mode (it was already unchecked on the normal path) so we reach the print; filecheck=
    ignores the process result code. The capture callback is scoped to just the dispatch.
  • Buffer-comparison branch. Rather than fabricate a readback after device loss, the empty
    .expected.txt siblings match the empty (never-written) output buffer — the cleanest neutralizer
    for the always-on buffer comparison.
  • Verification / honesty. Verified here: the C++ compiles; the existing abort-runtime.slang
    SIMPLE emit line still PASSES; all four new -vk -render-features shader-abort lines SKIP cleanly
    with no GPU (SLANG_E_NOT_AVAILABLE → ignored, not failed). Not verifiable here (no GPU): the
    abort actually firing → device-fault → "Shader abort: ..." round-trip → FileCheck match. That
    end-to-end is a maintainer hardware run.

Dependency / merge status

Draft, and cannot merge before #11792. This branch is based on #11792's fix/issue-11790
branch because rhi::Feature::ShaderAbort only exists in the bumped submodule — the capture code
does not compile against master. #11792 itself is gated on
slang-rhi#782 merging. Kept separate (rather
than folded into #11792) so #11792 stays a minimal submodule-bump + gated-test that can come out of
draft the moment slang-rhi#782 lands; this can fold into #11792 instead if preferred.

Related to #11790. Does not auto-close it (the GPU round-trip is verified separately).

🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.

When `-render-features shader-abort` is requested, render-test now captures
the host-delivered "Shader abort: ..." debug message (forwarded by slang-rhi
via the Vulkan device-fault path) and prints it to stdout, so a slang-test
`filecheck=` line can assert the formatted abort message.

The capture is strictly gated on the shader-abort feature being requested,
leaving every other render test byte-for-byte unchanged. A fired abort loses
the device, so update()'s result is swallowed in this mode (it was already
unchecked on the normal path) to reach the print.

tests/spirv/abort-message.slang adds plain / %f / multi-arg / mixed-type
abort cases, each asserting the captured message; empty .expected.txt
siblings satisfy the harness buffer comparison against the empty
(device-loss) output buffer. These -vk lines skip cleanly where the feature
is unavailable; the GPU message round-trip is verified on capable hardware.

Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove all of "*.expected.txt" files.
We prefer to use inlined //CHK: sytax.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — pushed fa1a35d976. Removed all four *.expected.txt files and moved the assertions inline.

Each case now combines two inline FileCheck prefixes in one directive:

  • filecheck=<PREFIX> asserts the captured Shader abort: … message on render-test's stdout (what the capture mode prints), and
  • filecheck-buffer=EMPTY asserts the readback buffer stays empty — a fired abort loses the device, so nothing is written back; the shared // EMPTY-NOT: {{.}} satisfies COMPARE_COMPUTE's mandatory buffer comparison without a sidecar file.

I kept filecheck= for the message per your earlier note. One choice worth your call: since the abort loses the device there's no readback, so the buffer side is an emptiness assert rather than a positive value check. If you'd rather have render-test write the captured Shader abort: … text into the output buffer so a single positive filecheck-buffer= checks it (dropping the stdout path and the empty-assert entirely), I'm happy to switch — just say the word.

Verified locally: the combined directives parse and the four -vk lines skip cleanly on a GPU-less runner, and the existing -target spirv emit case still passes. As before, the abort-fires → message round-trip → FileCheck match is the on-hardware run.

🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.

Per review on #11799: remove the four empty .expected.txt sidecar files
and assert inline instead. Each case now combines `filecheck=<MSG>`
(asserts the captured "Shader abort: ..." text on render-test stdout)
with `filecheck-buffer=EMPTY` (a CHECK-NOT asserting the device-loss
readback buffer stays empty), satisfying COMPARE_COMPUTE's stdout and
buffer comparisons inline without sidecar expected files.

Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>
@jkwak-work jkwak-work marked this pull request as ready for review June 28, 2026 00:07
@jkwak-work jkwak-work requested a review from a team as a code owner June 28, 2026 00:07
@jkwak-work jkwak-work requested review from bmillsNV and removed request for a team June 28, 2026 00:07
@jkwak-work jkwak-work self-assigned this Jun 28, 2026
@jkwak-work

Copy link
Copy Markdown
Collaborator

Looks good to me.

But blocked until the following PR gets merged on slang-rhi:

@nv-slang-bot

nv-slang-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Thanks! Agreed on the gating.

I'll hold this draft — and the underlying submodule bump it stacks on (#11792) — until slang-rhi#782 merges. At that point I'll move the external/slang-rhi pin from the PR-branch commit to the merged slang-rhi main SHA and take both out of draft. No further changes needed here in the meantime.

🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants