Skip to content

Conversation

@tarcila
Copy link
Collaborator

@tarcila tarcila commented Mar 17, 2025

Aims at enabling better CUDA/OptiX debugging and tracing experience.
Also fixes a few issues with pipeline compilation WRT thread safety and device stack size.

tarcila added 4 commits March 17, 2025 15:26
…g mode

CUDA compilation flags and optix module/pipeline creation flags must
match.

Forcing O1 debug flag on debug builds was making the debug experience
slightly less convenient.
When building in debug mode or forcing no-inline, where parameters are
then passed on the stack, the lambda capture ends up referencing values
which were popped of the stack by the time the async callback is run,
leading to crashes.
The issue is less/not reproducible with optimized builds, most likely
due to the way parameter passing is changed.
The content expected by ANARI samplers are unsigned, while MDL deals
with signed ints. Do the conversion upfront.
Note that we cast to unsigned first. Unsigned are having a fully defined
wrapping behavior, while wrapping on signed is considered UB and can be
optimized in any possible way by the compiler.
Quoting OptiX documentation for optixPipelineSetStackSize:
`If this method is not used, an internal default implementation is used.
The default implementation is correct (but not necessarily optimal) as
long as the maximum depth of call trees of CC programs is at most 2,
and no DC programs or motion transforms are used.`

Given that we are using direct callables, the default heuristic is not
giving the expected result and can result in crashes when run using cuda
compute sanitizer. Ensure that we actually push the correct stack sizes
for our pipelines.
@jeffamstutz
Copy link
Collaborator

LGTM, thanks!

@jeffamstutz jeffamstutz merged commit 3b5e29c into NVIDIA:next_release Mar 17, 2025
5 checks passed
@tarcila tarcila deleted the misc-fixes branch May 8, 2025 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants