Fix stale test debug callback#11785
Conversation
This comment has been minimized.
This comment has been minimized.
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR scopes debug-callback binding, adds mutex synchronization around callback access, updates three test runners to use the scoped helper, and adds unit tests for scope exit, isolation, exception unwinding, retained bridge reuse, and concurrent message handling. ChangesScoped debug callback lifetime updates
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment has been minimized.
This comment has been minimized.
1363648 to
c9f97d5
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
fe205ef to
b1ae635
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2c2326a3-efb4-40cc-a7be-e9771d91ad5b
📒 Files selected for processing (2)
tools/gfx-unit-test/scoped-core-debug-callback-test.cpptools/render-test/slang-support.h
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Actionable comments posted: 1
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 7da67062-a749-4fe4-8dd5-526260677c34
📒 Files selected for processing (4)
tools/gfx-unit-test/scoped-core-debug-callback-test.cpptools/render-test/slang-support.htools/slang-test/slang-test-main.cpptools/test-server/test-server-main.cpp
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
tools/slang-test/slang-test-main.cpp (1)
5690-5700: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winClose the scoped bridge before taking the final debug-layer snapshot.
Line 5700 reads the buffer while
scopedDebugCallbackis still bound. A validation-thread message can append after this read but before the scoped object destructs, so the current test can pass despite a debug-layer error; end the scoped binding first, then callgetString().Proposed fix
- renderer_test::ScopedCoreDebugCallback scopedDebugCallback( - *rhiDebugBridge, - &coreDebugCallback); - try { - slang_replayMarker(test.testName.getBuffer()); - test.testFunc(&unitTestContext); + { + renderer_test::ScopedCoreDebugCallback scopedDebugCallback( + *rhiDebugBridge, + &coreDebugCallback); + slang_replayMarker(test.testName.getBuffer()); + test.testFunc(&unitTestContext); + } // Check for VVL errors after test completion String debugMessages = coreDebugCallback.getString();tools/test-server/test-server-main.cpp (1)
546-584: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick winUnbind before serializing
debugLayerinto the RPC result.Line 584 captures
coreDebugCallbackwhilescopedDebugCallbackis still active. A late validation message can arrive after the snapshot and be omitted from the RPC response, soslang-testcan miss a debug-layer failure.Proposed fix
- renderer_test::ScopedCoreDebugCallback scopedDebugCallback( - *rhiDebugCallback, - &coreDebugCallback); - testModule->setTestReporter(&testReporter); @@ try { - testFunc(&unitTestContext); + { + renderer_test::ScopedCoreDebugCallback scopedDebugCallback( + *rhiDebugCallback, + &coreDebugCallback); + testFunc(&unitTestContext); + } } catch (...) { testReporter.m_failCount++;
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4a5e7e9b-80f5-4710-8494-4e5c6e3e2880
📒 Files selected for processing (5)
tools/gfx-unit-test/scoped-core-debug-callback-test.cpptools/render-test/render-test-main.cpptools/render-test/slang-support.htools/slang-test/slang-test-main.cpptools/test-server/test-server-main.cpp
There was a problem hiding this comment.
Verdict: ✅ Clean — no significant issues found
Replaces a process-static CoreToRHIDebugBridge with per-invocation retained, ref-counted, mutex-protected bridges plus an RAII ScopedCoreDebugCallback that binds/unbinds the bridge's inner callback only while a test is active. RHI may retain the bridge pointer across invocations; the per-test core callback is bound only for the active scope, so late RHI messages either land on a cleared bridge or a different bridge belonging to the prior invocation.
Changes Overview
Bridge / scoped binder (tools/render-test/slang-support.h)
CoreToRHIDebugBridgenow inheritsSlang::RefObjectand serializessetCoreCallback/handleMessageon astd::mutex. The forward path holds the mutex acrosscoreCallback->handleMessage(...), so~ScopedCoreDebugCallback'ssetCoreCallback(nullptr)cannot return while an in-flight forward is still dereferencing the inner callback.- New
createRetainedCoreToRHIDebugBridge()factory parks each created bridge in aninline-function-localstatic List<RefPtr<...>>, guaranteeing the bridge survives any RHI-retained raw pointer for the rest of the process. The function-local statics use the leak-on-purposenewidiom to avoid static-destruction-order issues across TUs. - New
ScopedCoreDebugCallbackRAII helper binds the bridge's inner callback on construction and clears it on destruction; copy/assign deleted. CoreDebugCallback::handleMessage/clear/getStringnow lockm_mutexso concurrent RHI writes and harness reads do not race on theStringBuilder.
Call sites (tools/render-test/render-test-main.cpp, tools/slang-test/slang-test-main.cpp, tools/test-server/test-server-main.cpp)
- Each harness entry point creates a fresh retained bridge, hands its
.Ptr()to RHI device / unit-test-context creation, and wraps execution in aScopedCoreDebugCallback. Inslang-test-main.cpp, the scope is per-test inside the iteration loop andcoreDebugCallback.clear()runs immediately before the scope binds, so each test sees only its own messages.
Regression tests (tools/gfx-unit-test/scoped-core-debug-callback-test.cpp, new)
scopedCoreDebugCallbackClearsBridgeOnExit— messages after scope exit are dropped.scopedCoreDebugCallbackDoesNotLeakAcrossScopes— sequential bindings keep each callback's buffer isolated.scopedCoreDebugCallbackClearsBridgeOnException— exception unwind still clears the bridge.scopedCoreDebugCallbackSeparatesRetainedBridgeScopes— emitting via a retainedoldBridgefrom an earlier scope, while a new scope bindsnextBridge, does not contaminatenextCallback(this is the cross-test-contamination scenario the PR fixes).coreDebugBridgeHandlesConcurrentMessages— four writer threads × 1024 messages plus a concurrent reader; assertscapturedLength > 0, bounded bykThreadCount * kMessageCount * 2, and(capturedLength % 2) == 0(eachemitErrorwrites"x\n"atomically under the callback mutex). All threads are joined before stack-localbridge/callbackdestruct.
Fixes #11720
Motivation
Running
slang-teston Windows with debug layers enabled could fail when tests ran in-process without a test server. The minimized reproducer wasgfx-unit-test-tool/pointerInBufferRoundtripVulkan.internal, and the broader issue showed later tests such astests/preprocessor/duplicate-include/a.slangfailing after earlier GPU/debug-layer output had already been emitted.The test harness was passing RHI devices a debug callback bridge whose target
CoreDebugCallbackbelonged to the currently active test invocation. Some RHI objects and test-tool shared-library state can outlive that invocation, so the bridge could retain a pointer to stale per-test callback storage or forward a late message to the wrong test.Proposed solution
Give each harness invocation its own retained RHI-facing bridge anywhere a device or loaded tool can retain the callback, and bind that bridge to the current
CoreDebugCallbackonly for the active invocation.renderer_test::ScopedCoreDebugCallbacksets the inner core callback on entry and clears it on exit, so late debug-layer messages cannot dereference stale callback storage or leak into the next test. Distinct retained bridges also prevent an old RHI emitter from forwarding into a later test's active callback. The bridge serializes binding changes with message forwarding, andCoreDebugCallbackserializes buffer access, because RHI debug callbacks can be invoked from backend or driver threads.Change summary
tools/render-test/slang-support.h:21adds a mutex-protected, reference-countedCoreToRHIDebugBridge,tools/render-test/slang-support.h:58addscreateRetainedCoreToRHIDebugBridge,tools/render-test/slang-support.h:70addsScopedCoreDebugCallback, andtools/render-test/slang-support.h:111makesCoreDebugCallbackbuffer access thread-safe. The helper documents that messages arriving while no scoped callback is active are intentionally dropped instead of forwarded to stale per-test callback storage, and its comments record the thread-safety and retained-lifetime contracts.tools/render-test/render-test-main.cpp:1772creates a retained bridge for that render-test invocation, binds it around execution, and passes that same bridge to RHI device creation attools/render-test/render-test-main.cpp:1912.runUnitTestModuleintools/slang-test/slang-test-main.cppcreates a retained RHI bridge for each in-process unit test attools/slang-test/slang-test-main.cpp:5688, passes it throughUnitTestContextattools/slang-test/slang-test-main.cpp:5689, and binds it only around that test attools/slang-test/slang-test-main.cpp:5690.tools/test-server/test-server-main.cpp:546mirrors the unit-test server path by creating a retained RHI bridge per RPC and binding it to the per-RPCCoreDebugCallbackonly while executing that unit test.tools/gfx-unit-test/scoped-core-debug-callback-test.cpp:29covers the bridge lifetime contract: a retained bridge is a no-op after the scoped binding exits, a later scoped binding receives only its own messages, sequential scoped bindings do not leak messages between callbacks, and exception unwinding clears the bridge.tools/gfx-unit-test/scoped-core-debug-callback-test.cpp:102mirrors the production shape with distinct retained bridges and stack callbacks, andtools/gfx-unit-test/scoped-core-debug-callback-test.cpp:116exercises concurrent debug-message delivery, binding clear, and harness reads.Concepts and vocabulary
CoreToRHIDebugBridgeis the object passed to RHI as anrhi::IDebugCallback; it forwards RHI messages into Slang core'sIDebugCallback. RHI may call this object from backend or driver threads, so binding changes and forwarded messages must be synchronized.CoreDebugCallbackis the per-test collector that stores debug-layer error messages soslang-testcan report them with the originating test result.The RHI debug layer can emit messages from driver or validation objects whose lifetime is not identical to a single test function call, so the callback object passed to RHI must not be stack storage for a narrower lifetime.
Reviewer Directives (maintained by agent)
CoreToRHIDebugBridgewhile no scoped callback is active are intentionally dropped; do not forward them to a previous or next per-test callback. (Fix stale test debug callback #11785 (comment))Process report
The relevant input shape is valid: RHI code is allowed to retain the callback pointer for the lifetime of devices and validation/debug objects, and the test infrastructure intentionally reuses loaded tools and can keep RHI state alive beyond a single test invocation. The producer should not be changed to assume every debug message is synchronous with one test function. The harness must instead make the callback object lifetime match the retained pointer.
For render-test execution,
_innerMainintools/render-test/render-test-main.cppcreates one retained bridge for the invocation and passes that bridge into RHI device creation. The scoped binder clears the bridge's inner core callback before returning tospawnAndWaitSharedLibrary, while the retained bridge object itself remains alive for any RHI state that still holds the raw callback pointer.For in-process unit tests,
runUnitTestModuleintools/slang-test/slang-test-main.cpppreviously created the bridge as ordinary local storage and reused oneCoreDebugCallbackwhile iterating tests. A late RHI message after a test completed could therefore be delivered outside that test's reporting scope, and a retained callback pointer could outlive the local bridge. Creating a retained bridge per test gives retained RHI state a stable callback object without sharing that object with later tests. Binding only around the active test means delayed messages from that test's RHI objects hit that test's cleared bridge and are dropped rather than being attributed to a later test.For test-server unit tests,
_executeUnitTestintools/test-server/test-server-main.cpphas the same callback-retention problem across one RPC. The server-side change applies the same lifetime rule: a retained bridge per RPC for RHI, a scoped per-RPC core callback for reporting.Reviewer feedback noted that RHI callbacks can arrive on backend or driver threads. The bridge now protects its inner callback pointer with a mutex and holds that mutex while forwarding a message, so
ScopedCoreDebugCallbackdestruction cannot clear the pointer and let the per-test callback die while an in-flight forwarded message is still using it.CoreDebugCallbackalso protectshandleMessage,clear, andgetStringwith a mutex so concurrent RHI messages cannot race on theStringBuilder.coreDebugBridgeHandlesConcurrentMessagesexercises this invariant with concurrent writers, scoped binding teardown while writers are active, and a concurrent reader so sanitizer configurations can catch accidental unsynchronized access.Reviewer feedback asked whether clearing the bridge silently drops late messages. That drop is intentional and belongs at this layer: while no scoped binding is active there is no current test result that can own the message, and forwarding to a destroyed or subsequent per-test
CoreDebugCallbackis the stale-pointer/cross-test contamination bug this PR fixes. Later feedback pointed out that a single shared retained bridge could still forward an old emitter's delayed message into a later active scope. The fix is therefore one retained bridge per invocation: old emitters call the old bridge, whose scoped binding has been cleared, while the later invocation uses a distinct bridge. The helper now documents that contract, andscopedCoreDebugCallbackClearsBridgeOnExit,scopedCoreDebugCallbackDoesNotLeakAcrossScopes,scopedCoreDebugCallbackClearsBridgeOnException, andscopedCoreDebugCallbackSeparatesRetainedBridgeScopespin the no-op-after-scope, no-cross-scope-leak, exception-unwind, and retained-bridge separation behavior.Reviewer feedback clarified that
ExecuteResult::debugLayeris not exclusively RHI validation output for ordinary in-process tool execution:StdWriters::setDebugCallbackcan also route compiler diagnostics into that field. This PR therefore leaves the existing_validateOutputforce-failure behavior unchanged outside the unit-test debug-layer paths, so negative compile tests that intentionally produce diagnostics are not failed merely because that diagnostic channel is non-empty.