Add low-memory direct random fills#1195
Conversation
|
/build |
Greptile SummaryThis PR adds a low-memory direct-fill path for
Confidence Score: 4/5Safe to merge with minor cleanup; the new direct-fill and materialize paths are correctly exception-safe and the host double-scale bug is fixed. The CUDA code paths (cuRAND bulk, Philox tail, RAII guards, ValuesGuard rollback) are correct and well-tested. The host NORMAL scale fix is verified by a new statistical test. A dead unreachable assert and a missing can_alias guard are the only observations; neither affects correctness today. include/matx/generators/random.h (dead assert in GenerateCurandContiguous) and include/matx/operators/base_operator.h (missing can_alias guard in new direct-fill dispatch branch). Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["(out = random_op).run(exec)"] --> B{is_matx_direct_assign_op
&& tensor_view && CUDA?}
B -- yes --> C{CanDirectFill
shapes match?}
C -- no --> D[MATX_THROW matxInvalidSize]
C -- yes --> E[TransformExec InnerPreRun=no-op]
E --> F{CanUseCurandGenerator
contiguous float/double/complex?}
F -- yes --> G[GenerateCurandContiguous
cuRAND bulk fill on stream]
G --> H{NORMAL float/double
odd count? has_tail}
H -- yes --> I[ScaleContiguous first gen_count]
I --> J[LaunchMaterializeFill tail via Philox]
H -- no --> K[ScaleContiguous all elements]
F -- no --> L[LaunchStateFreeFill
per-element Philox DirectValue]
Reviews (18): Last reviewed commit: "Fix host random normal scaling" | Re-trigger Greptile |
|
@greptile review |
|
@greptile review |
|
@greptile review |
|
@greptile review |
|
/build |
|
/build |
|
@greptile review |
|
@greptile review |
|
@greptile review |
|
@greptile review latest commit 610f40c |
|
@greptile review |
|
/build |
Currently the
random()call allocates space for state variables equivalent to the size of the tensor. This is wasteful on direct assignments since the operator is not being reused after that. This change bypasses the state allocation when the random variable won't be used later.