Support unified memory #1812

oschulz · 2025-10-31T10:26:06Z

Doesn't fully work yet. Allocation of unified memory works after relaxing the limits of XLA_REACTANT_GPU_MEM_FRACTION in __init__() in XLA.jl. With

export TF_FORCE_UNIFIED_MEMORY=1
export XLA_REACTANT_GPU_MEM_FRACTION=4

we can run

using Reactant

Reactant.set_default_backend("cuda")

Reactant.XLA.default_device()
Reactant.XLA.XLA_REACTANT_GPU_MEM_FRACTION[]

A = ConcreteRArray{Float32}(undef, 6*10^10)
sizeof(eltype(A)) * length(A) / 1024^3

and successfully allocate 224 GiB on an NVIDIA GH200 system with 96GB GPU RAM and 480GB CPU RAM.

nvtop actually shows the GPU ram being filled up and then flattening out when full, and free shows that the rest of the array has been allocated on CPU RAM. (Note, nvidia-smi is not helpful, it only shows 578MiB allocated by the Julia process in unified memory mode, but from what I read that's expected.)

But when I try to fill and sum the array

fill_sum = @compile sum(fill!(A, one(eltype(A))))
fill_sum(A)

compilation fails with

E0000 00:00:1761906288.544836 1566209 gpu_hlo_schedule.cc:817] The byte size of input/output arguments (240000000000) exceeds the base limit (81604378624). This indicates an error in the calculation!

so the compiler still tries to limit sizes to GPU ram instead of unified RAM.

@wsmoses I think I need some help, here.

codecov · 2025-10-31T11:06:26Z

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.52%. Comparing base (b39a1fc) to head (a2ea596).
⚠️ Report is 117 commits behind head on main.

Files with missing lines	Patch %	Lines
src/xla/XLA.jl	0.00%	5 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1812      +/-   ##
==========================================
- Coverage   68.16%   64.52%   -3.65%     
==========================================
  Files         109      113       +4     
  Lines       11779    12557     +778     
==========================================
+ Hits         8029     8102      +73     
- Misses       3750     4455     +705

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Allow XLA memory fraction greater than one when using unified memory

a2ea596

oschulz marked this pull request as draft October 31, 2025 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support unified memory #1812

Support unified memory #1812

Uh oh!

oschulz commented Oct 31, 2025 •

edited

Loading

Uh oh!

codecov bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Support unified memory #1812

Are you sure you want to change the base?

Support unified memory #1812

Uh oh!

Conversation

oschulz commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 31, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

oschulz commented Oct 31, 2025 •

edited

Loading