Skip to content

Add low-memory direct random fills#1195

Merged
cliffburdick merged 10 commits into
mainfrom
cburdick/low-memory-random-fill
Jun 4, 2026
Merged

Add low-memory direct random fills#1195
cliffburdick merged 10 commits into
mainfrom
cburdick/low-memory-random-fill

Conversation

@cliffburdick
Copy link
Copy Markdown
Collaborator

Currently the random() call allocates space for state variables equivalent to the size of the tensor. This is wasteful on direct assignments since the operator is not being reused after that. This change bypasses the state allocation when the random variable won't be used later.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented Jun 2, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@cliffburdick cliffburdick requested a review from simonbyrne June 2, 2026 15:56
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR adds a low-memory direct-fill path for random() and randomi() operators, bypassing per-element curandStatePhilox4_32_10_t state allocation when the operator is assigned directly into a CUDA tensor. It also fixes the long-standing host NORMAL double-scale bug and introduces RAII guards (CurandGeneratorGuard, ValuesGuard) for exception-safe resource cleanup.

  • Direct-fill path: A new matx_direct_assign_op trait causes base_operator.h to route direct tensor assignment through TransformExecRandomOp::Exec, which either uses cuRAND bulk generation (contiguous float/double/complex tensors) or a per-element Philox kernel without a separate state buffer.
  • Expression path unchanged: PreRun materializes values into a temporary values_ buffer; ValuesGuard ensures cleanup on exception.
  • Host NORMAL scale fix: The host operator() previously double-applied the affine transform; it now generates N(0,1) and applies the transform once.

Confidence Score: 4/5

Safe to merge with minor cleanup; the new direct-fill and materialize paths are correctly exception-safe and the host double-scale bug is fixed.

The CUDA code paths (cuRAND bulk, Philox tail, RAII guards, ValuesGuard rollback) are correct and well-tested. The host NORMAL scale fix is verified by a new statistical test. A dead unreachable assert and a missing can_alias guard are the only observations; neither affects correctness today.

include/matx/generators/random.h (dead assert in GenerateCurandContiguous) and include/matx/operators/base_operator.h (missing can_alias guard in new direct-fill dispatch branch).

Important Files Changed

Filename Overview
include/matx/generators/random.h Core change: replaces per-element cuRAND state with a materialized value buffer for the expression path, and adds a stateless direct-fill path (cuRAND bulk or per-element Philox). Exception safety is handled via ValuesGuard and CurandGeneratorGuard. Contains one dead-code MATX_ASSERT_STR that is always satisfied after the has_tail adjustment.
include/matx/operators/base_operator.h Adds a new dispatch branch for matx_direct_assign_op types, routing direct tensor assignment through TransformExec without allocating temporary output memory. Missing the can_alias short-circuit guard present in the adjacent transform-op path.
test/00_operators/GeneratorTests.cu Adds comprehensive tests: seeded reproducibility, odd-size NORMAL, complex UNIFORM/NORMAL, integer direct fill, rank-0, shape-mismatch throw, expression materialize consistency, and host single-scale verification.
docs_input/api/random/random.rst Documentation updated to describe the direct-fill path, path-selection criteria, reproducibility caveats, and complex beta-only-real-shift behavior.
docs_input/quickstart.rst Quickstart updated to reflect materialize-once semantics in compound expressions and recommend direct assignment for large tensors.
docs_input/executor_compatibility.rst Compatibility table updated with accurate notes about the direct-fill and expression-materialize paths for random and randomi.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["(out = random_op).run(exec)"] --> B{is_matx_direct_assign_op
&& tensor_view && CUDA?}
    B -- yes --> C{CanDirectFill
shapes match?}
    C -- no --> D[MATX_THROW matxInvalidSize]
    C -- yes --> E[TransformExec InnerPreRun=no-op]
    E --> F{CanUseCurandGenerator
contiguous float/double/complex?}
    F -- yes --> G[GenerateCurandContiguous
cuRAND bulk fill on stream]
    G --> H{NORMAL float/double
odd count? has_tail}
    H -- yes --> I[ScaleContiguous first gen_count]
    I --> J[LaunchMaterializeFill tail via Philox]
    H -- no --> K[ScaleContiguous all elements]
    F -- no --> L[LaunchStateFreeFill
per-element Philox DirectValue]
Loading

Reviews (18): Last reviewed commit: "Fix host random normal scaling" | Re-trigger Greptile

Comment thread include/matx/generators/random.h Outdated
Comment thread include/matx/generators/random.h
Comment thread include/matx/generators/random.h Outdated
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread include/matx/operators/base_operator.h
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

Comment thread include/matx/generators/random.h Outdated
@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review latest commit 610f40c

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

@greptile review

@cliffburdick
Copy link
Copy Markdown
Collaborator Author

/build

@cliffburdick cliffburdick merged commit 2cb9cde into main Jun 4, 2026
1 check failed
@cliffburdick cliffburdick deleted the cburdick/low-memory-random-fill branch June 4, 2026 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants