Skip to content

Add cuRANDDx Support#1198

Merged
cliffburdick merged 4 commits into
mainfrom
cburdick/curanddx-jit-random
Jun 5, 2026
Merged

Add cuRANDDx Support#1198
cliffburdick merged 4 commits into
mainfrom
cburdick/curanddx-jit-random

Conversation

@cliffburdick
Copy link
Copy Markdown
Collaborator

No description provided.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 3, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 3, 2026

Greptile Summary

This PR wires in cuRANDDx as a JIT-fusion backend for floating-point random() operators when MATX_EN_MATHDX is enabled. At element counts ≤ 1024, CUDAJITExecutor now emits a Philox4-32 struct via NVRTC instead of materialising a temporary buffer, keeping random generation fully inside the fused kernel. Integer (randomi) and larger random tensors continue on the existing cuRAND materialization path.

  • CMake / link: curanddx_CUTLASS_ROOT, curanddx component, and mathdx::curanddx link target are added following the exact pattern used for cuBLASDx and cuSolverDx.
  • Generator core (random.h): Adds CanUseJITRandom(), get_jit_class_name(), and get_jit_op_str() which emit a self-contained templated struct with a deterministic GenerateStandardScalar(scalar_linear) device function; capability handlers (SUPPORTS_JIT, JIT_TYPE_QUERY, JIT_CACHE_KEY, JIT_CLASS_QUERY) delegate to these helpers.
  • Tests: Four new GTests cover capability gating, fused uniform fill with range bounds, repeated-leaf idempotency, and complex uniform with beta-on-real-only semantics.

Confidence Score: 5/5

Safe to merge; the JIT path is cleanly opt-in (requires MATX_EN_MATHDX + MATX_EN_JIT + element count at most 1024) and falls back to the existing cuRAND path otherwise.

All changes are additive and gated behind compile-time flags. The non-JIT code paths are untouched. The two flagged items are edge-case quality issues (NaN/Inf alpha serialisation and a silent fallthrough on a hypothetical future distribution), neither of which affects current correct behavior.

include/matx/generators/random.h — specifically JITValueLiteral and the distribution dispatch inside the generated GenerateStandardScalar string.

Important Files Changed

Filename Overview
include/matx/generators/random.h Core change: adds CanUseJITRandom(), JIT string-generation helpers, and get_jit_op_str() that emits a cuRANDDx Philox4-32 template struct for NVRTC compilation; also adds capability handlers for SUPPORTS_JIT, JIT_TYPE_QUERY, JIT_CACHE_KEY, JIT_CLASS_QUERY. One issue: JITValueLiteral does not guard against NaN/Inf alpha/beta, producing non-compilable NVRTC code strings.
cmake/FindMathDx.cmake Adds curanddx_CUTLASS_ROOT variable (matching the pattern for cublasdx and cusolverdx) and adds curanddx to the find_package component list.
include/matx/core/jit_includes.h Includes curanddx.hpp under the MATX_EN_MATHDX guard so cuRANDDx types are available to NVRTC-compiled JIT kernels.
include/matx/operators/base_operator.h Adds an explanatory comment clarifying that the direct-fill path is gated on is_tensor_view_v; no logic change.
test/00_operators/GeneratorTests.cu Adds four new tests: JIT capability gating, fused uniform fill with bounds verification, repeated-leaf idempotency, and complex uniform with beta-on-real-only semantics.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["random<T>({N}, dist, seed).run(exec)"] --> B{Executor type?}
    B -->|CUDAJITExecutor| C{CanUseJITRandom?}
    C -->|Yes - float, 0 < N <= 1024| D["get_jit_op_str() - emit JITRandomOp struct via cuRANDDx"]
    D --> E["NVRTC compiles fused kernel"]
    E --> F["Per-thread Philox4-32: group=linear>>2, lane=linear&3"]
    C -->|No| G{Direct assign? LHS is tensor_view}
    B -->|cudaExecutor| G
    G -->|Yes - contiguous| H["cuRAND fill bulk + tail"]
    G -->|No - generic expr| I["PreRun: allocate values_ buffer, element-wise op()"]
    H --> J[Result]
    I --> J
    F --> J
Loading

Reviews (5): Last reviewed commit: "Address Greptile random review feedback" | Re-trigger Greptile

Comment thread include/matx/generators/random.h Outdated
Comment thread include/matx/generators/random.h
Comment thread include/matx/operators/base_operator.h
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile-apps review

@cliffburdick cliffburdick force-pushed the cburdick/curanddx-jit-random branch from f4b558b to f304890 Compare June 4, 2026 16:41
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick cliffburdick merged commit f9f616a into main Jun 5, 2026
1 check failed
@cliffburdick cliffburdick deleted the cburdick/curanddx-jit-random branch June 5, 2026 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant