Skip to content

Integration: bump slang-rhi #782 (VK_KHR_shader_abort) + abort test#11792

Open
nv-slang-bot[bot] wants to merge 1 commit into
masterfrom
fix/issue-11790
Open

Integration: bump slang-rhi #782 (VK_KHR_shader_abort) + abort test#11792
nv-slang-bot[bot] wants to merge 1 commit into
masterfrom
fix/issue-11790

Conversation

@nv-slang-bot

@nv-slang-bot nv-slang-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Integration PR for the abort() runtime path on Vulkan (VK_KHR_shader_abort), so the slang-rhi support in shader-slang/slang-rhi#782 can be exercised through slang-test. It does two things:

  1. Bumps the external/slang-rhi submodule from 687dc18 to dfd2e66 — the head of slang-rhi PR Cryptic error message with missing function parameter #782 (fix/issue-781, "Vulkan: support VK_KHR_shader_abort + VK_KHR_device_fault", Fixes shader-slang/slang-rhi#781). That commit adds Feature::ShaderAbort / "shader-abort" and enables VK_KHR_shader_abort + VK_KHR_device_fault, and bumps slang-rhi's Vulkan-Headers dependency to v1.4.347 (the first release with the VK_KHR_shader_abort symbols).
  2. Adds a gated runtime test tests/spirv/abort-runtime.slang that compiles and runs a compute shader containing abort(), gated on the new render feature so it skips cleanly where unsupported.

Important

This PR is intentionally a draft and cannot merge yet. slang-rhi #782 is still open, so the submodule pin (dfd2e66) points at a PR-branch commit, not a merged SHA. This is deliberate — it lets #782 be integration-tested against slang-test before it lands. Once #782 merges, the pin must be moved to the merged slang-rhi main SHA before this can come out of draft.

How the Vulkan-Headers v1.4.347 bump reaches slang's build

slang-rhi sources its own Vulkan headers: its CMakeLists.txt does FetchPackage(vulkan_headers URL "${SLANG_RHI_VULKAN_HEADERS_URL}") (guarded only by SLANG_RHI_ENABLE_VULKAN) and links the slang-rhi library against the resulting slang-rhi-vulkan-headers INTERFACE target. slang does not define SLANG_RHI_VULKAN_HEADERS_URL, so slang-rhi's own value wins, and #782 changed that value from v1.4.318 to v1.4.347. Therefore bumping the submodule pin alone brings the new headers in — slang's own external/vulkan submodule (currently v1.4.307) is not used by slang-rhi's Vulkan backend, so it is intentionally left untouched.

The test

tests/spirv/abort-runtime.slang has two lines:

  • A SIMPLE SPIR-V emit line (-target spirv -capability abort) that confirms abort() lowers to OpAbortKHR with the AbortKHR capability + SPV_KHR_abort extension. This has no device dependency, so it runs and is verified everywhere, including GPU-less CI.
  • A COMPARE_COMPUTE -vk line gated on -render-features shader-abort. render-test maps the "shader-abort" name to rhi::Feature::ShaderAbort via the SLANG_RHI_FEATURES X-macro (no render-test change needed — both options.cpp's kValidFeatureNames and render-test-main.cpp's kFeatureNameMap are generated from that macro). On a device/driver without the feature — or no Vulkan device at all — render-test returns SLANG_E_NOT_AVAILABLE and the line is skipped cleanly.

The abort() call sits on a branch the 4-thread dispatch never reaches (if (tid.x > 0x1000)). tid.x is a runtime value, so the compiler still emits OpAbortKHR (and the abort-capable pipeline is created), but the abort never fires — so the device is not lost and the dispatch -> readback -> compare model still produces a clean PASS on capable hardware. This is deliberate: OpAbortKHR causes device loss, which slang-test's compare model cannot tolerate, and there is no slang-test directive for device-fault validation.

What is verified, and what is not

Verified locally (GPU-less Linux):

  • slang builds against the bumped slang-rhi submodule; slang-rhi's new src/vulkan/vk-device.cpp / vk-api.h / vk-utils.cpp compile against the fetched v1.4.347 headers (render-test, which links slang-rhi, builds).
  • The new test is discovered by slang-test.
  • The SIMPLE emit line passes (OpAbortKHR emitted).
  • The -vk COMPARE_COMPUTE line skips cleanly (no Vulkan device here).
  • Existing tests/spirv/abort*.slang emit tests still pass (no regression).

What a PASS on capable hardware demonstrates (for whoever runs it there): slang-rhi selecting and advertising Feature::ShaderAbort, creating a Vulkan device with VK_KHR_shader_abort enabled, and successfully dispatching a non-aborting path through a pipeline that contains OpAbortKHR. It does not demonstrate the abort firing.

NOT verified here or in CI (no capable hardware): the abort actually firing, the device-fault path, and the abort-message round-trip via vkGetDeviceFaultDebugInfoKHR. Those need a device with both VK_KHR_shader_abort and VK_KHR_device_fault and a device-loss-tolerant harness mode; they must be validated on capable hardware by a maintainer. This PR exists precisely so that validation can happen.

Files changed

Related

🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.

…d abort test

Bump external/slang-rhi 687dc18 -> dfd2e66 (shader-slang/slang-rhi#782 head,
branch fix/issue-781), which adds Feature::ShaderAbort / "shader-abort", enables
VK_KHR_shader_abort + VK_KHR_device_fault, and bumps slang-rhi's Vulkan-Headers
dependency to v1.4.347. slang-rhi self-fetches those headers via FetchPackage, so
bumping the pin is sufficient; slang's own external/vulkan is not used by the
slang-rhi Vulkan backend.

Add tests/spirv/abort-runtime.slang: a SIMPLE SPIR-V emit line (verified locally:
OpAbortKHR) plus a -vk COMPARE_COMPUTE line gated on -render-features shader-abort
that skips cleanly where the feature/device is unavailable. The abort() call sits
on a branch the dispatch never reaches, so the abort-capable pipeline is exercised
without device loss.

slang-rhi#782 is unmerged, so the pin points at a PR-branch commit; this PR stays
draft and cannot merge until #782 lands and the pin moves to a merged main SHA.

Related to #11790, shader-slang/slang-rhi#782 and #781.

Co-authored-by: Harsh Aggarwal <haaggarwal@nvidia.com>
@nv-slang-bot nv-slang-bot Bot added the pr: non-breaking PRs without breaking changes label Jun 27, 2026
@jkwak-work jkwak-work self-assigned this Jun 27, 2026
@jkwak-work jkwak-work marked this pull request as ready for review June 27, 2026 17:45
@jkwak-work jkwak-work requested a review from a team as a code owner June 27, 2026 17:45
@jkwak-work jkwak-work requested review from bmillsNV and removed request for a team June 27, 2026 17:45
jkwak-work
jkwak-work previously approved these changes Jun 27, 2026
Comment thread tests/spirv/abort-runtime.slang
@jkwak-work jkwak-work dismissed their stale review June 27, 2026 17:49

Test is not good enough

Comment thread tests/spirv/abort-runtime.slang
@nv-slang-bot

nv-slang-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @jkwak-work — taking both points together:

%f and multiple args (your "if it is valid"): valid. abort<each T>(NativeString format, expand each T args) is variadic, and e.g. abort("%f", f) and abort("i=%d u=%u f=%f", i, u, f) compile to OpAbortKHR cleanly (spirv-val passes). The emit-level packaging of mixed types and multi-arg offsets is already exercised today in tests/spirv/abort.slang (uint/int/float/bool, vectors, half/double, per-member offsets).

Testing the messages from abort() itself (the runtime round-trip): agreed this is the valuable bit, and it needs a piece that does not exist in render-test yet. After OpAbortKHR fires, the device is lost and slang-rhi#782 retrieves the text via vkGetDeviceFaultDebugInfoKHR + VkDeviceFaultShaderAbortMessageInfoKHR in DeviceImpl::reportShaderAbortMessage() (src/vulkan/vk-device.cpp), surfacing it through the IDebugCallback::handleMessage seam. But render-test at this pin has no path that (a) runs a shader that actually aborts, (b) treats the resulting DEVICE_LOST as expected rather than a hard failure, and (c) captures that reported message and prints it so a slang-test line can assert on it. That's exactly why abort-runtime.slang keeps abort() on a never-taken branch for now — the existing dispatch → readback → compare flow can't survive the device loss, so it can only validate feature-selection / device-creation / abort-capable-pipeline-execution, not the message.

To genuinely test the messages I'd add a render-test mode (gated on shader-abort) that dispatches an aborting shader, treats the expected device loss as success, and emits the captured abort message to stdout so a //TEST:...(filecheck=CHECK) line asserts it — with cases for %f, multiple args, and mixed types. I'm happy to implement that here. Two notes:

  1. It's render-test C++ I can only compile-verify in this environment; the actual GPU firing + message round-trip would be exercisable only on your capable hardware (same caveat as the rest of this PR).
  2. One design choice I'd like your steer on: how should the "expect-abort-and-check-message" intent be expressed? My default would be to capture the handleMessage output automatically when -render-features shader-abort is set and let a filecheck= line assert it; the alternative is an explicit flag (e.g. -expected-abort-message). Say which you prefer and I'll wire it up.

🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.

@jkwak-work

Copy link
Copy Markdown
Collaborator

@nv-slang-bot , it sounds good. Let's go with "capture the handleMessage output automatically when -render-features shader-abort is set and let a filecheck= line assert it;"

@nv-slang-bot

nv-slang-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

On it, @jkwak-work — auto-capture it is: when -render-features shader-abort is set, render-test will route the handleMessage abort text to stdout and a filecheck= line will assert it; I'll add %f, multiple-args, and mixed-type cases.

One logistics note (your call, since you own the harness): I'd suggest landing this capture mode + message tests as a separate follow-up PR rather than in #11792 — that keeps #11792 a minimal submodule-bump + gated-skip test that can come out of draft the moment slang-rhi#782 merges, decoupled from the larger render-test C++ change. Like the rest of this work, I can only compile-verify the capture mode here; the actual GPU abort-firing + message round-trip is your hardware run. If you'd rather have it all in #11792, just say so — I'll start building it now either way.

🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.

@nv-slang-bot

nv-slang-bot Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor Author

Done — implemented the auto-capture you described in #11799 (kept as a small follow-up so this PR stays a clean submodule-bump + gated-test that can leave draft the moment slang-rhi#782 merges; happy to fold it into #11792 instead if you'd rather).

What it does, matching your steer:

  • When -render-features shader-abort is set, render-test scopes a debug callback over the dispatch, then prints any host-delivered "Shader abort: ..." message to stdout; a COMPARE_COMPUTE(filecheck=...) line asserts it. Strictly gated — every other render test is byte-for-byte unchanged.
  • tests/spirv/abort-message.slang adds the cases you asked for: a plain string, %f, multiple args (%u %u %u), and mixed types (%d %f) — each its own entry point + //TEST: line, since a fired abort loses the device.
  • Empty .expected.txt siblings neutralize the harness's always-on buffer compare (a lost device writes no buffer).

Verified here: builds clean; the existing -capability abort emit line still passes; all four new -vk lines skip cleanly with no GPU. Not verified here (no GPU): the abort actually firing → device-fault → message round-trip → FileCheck match — that's your hardware run. The %f/mixed forms do lower to OpAbortKHR with the message operand, so the formatting should round-trip; please sanity-check the exact formatted strings my CHECK lines expect against what the driver emits.

🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.

@jkwak-work

Copy link
Copy Markdown
Collaborator

Looks good to me.

But blocked until the following PR gets merged on slang-rhi:

@nv-slang-bot

nv-slang-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown
Contributor Author

Thanks! Agreed on the gating.

Holding this as a draft until slang-rhi#782 merges — at that point I'll move the external/slang-rhi pin from the PR-branch commit (dfd2e66) to the merged slang-rhi main SHA and take it out of draft. The stacked render-test message-capture follow-up (#11799) rebases on top then. No further changes needed in the meantime.

🤖 Generated by an automated Slang coworker — may be inaccurate. A human maintainer should verify.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: non-breaking PRs without breaking changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant