Skip to content

Conversation

@Tortar
Copy link

@Tortar Tortar commented Jan 9, 2026

Fixes #3013

Please take into consideration that I'm not actually sure about the correctness of this since I have no experience working with the internals of this lib even if it does seem to solve the problem, an LLM was used to produce it

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Your PR requires formatting changes to meet the project's style guidelines.
Please consider running Runic (git runic master) to apply these changes.

Click here to view the suggested changes.
diff --git a/src/memory.jl b/src/memory.jl
index 78ed633c1..8181aca79 100644
--- a/src/memory.jl
+++ b/src/memory.jl
@@ -670,9 +670,9 @@ end
     mem
 end
 @inline function _pool_alloc(::Type{UnifiedMemory}, sz)
-  mem = alloc(UnifiedMemory, sz)
-  account!(memory_stats(), sz)
-  mem
+    mem = alloc(UnifiedMemory, sz)
+    account!(memory_stats(), sz)
+    return mem
 end
 @inline function _pool_alloc(::Type{HostMemory}, sz)
   alloc(HostMemory, sz)
@@ -727,8 +727,8 @@ end
     account!(memory_stats(mem.dev), -sizeof(mem))
 end
 @inline function _pool_free(mem::UnifiedMemory, stream::CuStream)
-  account!(memory_stats(), -sizeof(mem))
-  free(mem)
+    account!(memory_stats(), -sizeof(mem))
+    return free(mem)
 end
 @inline _pool_free(mem::HostMemory, stream::CuStream) = free(mem)
 

@codecov
Copy link

codecov bot commented Jan 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.46%. Comparing base (0c00b83) to head (5050835).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3014      +/-   ##
==========================================
+ Coverage   89.43%   89.46%   +0.02%     
==========================================
  Files         148      148              
  Lines       12991    12995       +4     
==========================================
+ Hits        11619    11626       +7     
+ Misses       1372     1369       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 5050835 Previous: 0c00b83 Ratio
latency/precompile 55440528447 ns 55103441406.5 ns 1.01
latency/ttfp 7811786159.5 ns 7854970810.5 ns 0.99
latency/import 4135630131 ns 4142920660.5 ns 1.00
integration/volumerhs 9611393 ns 9624895.5 ns 1.00
integration/byval/slices=1 146970 ns 147201 ns 1.00
integration/byval/slices=3 425954 ns 426000 ns 1.00
integration/byval/reference 145011 ns 145105 ns 1.00
integration/byval/slices=2 286518.5 ns 286632 ns 1.00
integration/cudadevrt 103528 ns 103846 ns 1.00
kernel/indexing 14146 ns 14265 ns 0.99
kernel/indexing_checked 14722 ns 14925 ns 0.99
kernel/occupancy 683.1148648648649 ns 783.1441441441441 ns 0.87
kernel/launch 2226.5555555555557 ns 2262.3333333333335 ns 0.98
kernel/rand 14753 ns 16624 ns 0.89
array/reverse/1d 19859 ns 20261 ns 0.98
array/reverse/2dL_inplace 66580 ns 66981 ns 0.99
array/reverse/1dL 70083 ns 70447 ns 0.99
array/reverse/2d 21937.5 ns 22244 ns 0.99
array/reverse/1d_inplace 9633 ns 11580 ns 0.83
array/reverse/2d_inplace 13446 ns 13344 ns 1.01
array/reverse/2dL 73826 ns 74284 ns 0.99
array/reverse/1dL_inplace 66890 ns 67021 ns 1.00
array/copy 20366 ns 20733 ns 0.98
array/iteration/findall/int 157825 ns 159065 ns 0.99
array/iteration/findall/bool 139945.5 ns 141350 ns 0.99
array/iteration/findfirst/int 160523.5 ns 162741 ns 0.99
array/iteration/findfirst/bool 161829 ns 164024 ns 0.99
array/iteration/scalar 71865.5 ns 72819 ns 0.99
array/iteration/logical 214670.5 ns 220064.5 ns 0.98
array/iteration/findmin/1d 90973 ns 56834 ns 1.60
array/iteration/findmin/2d 120862 ns 98602 ns 1.23
array/reductions/reduce/Int64/1d 43196 ns 44369 ns 0.97
array/reductions/reduce/Int64/dims=1 45261.5 ns 46082 ns 0.98
array/reductions/reduce/Int64/dims=2 61434.5 ns 62261 ns 0.99
array/reductions/reduce/Int64/dims=1L 88991 ns 89560 ns 0.99
array/reductions/reduce/Int64/dims=2L 87848.5 ns 88709 ns 0.99
array/reductions/reduce/Float32/1d 37083 ns 38170.5 ns 0.97
array/reductions/reduce/Float32/dims=1 51660.5 ns 43662 ns 1.18
array/reductions/reduce/Float32/dims=2 59862 ns 60196 ns 0.99
array/reductions/reduce/Float32/dims=1L 52305 ns 52890 ns 0.99
array/reductions/reduce/Float32/dims=2L 71875 ns 72931 ns 0.99
array/reductions/mapreduce/Int64/1d 43420 ns 44486 ns 0.98
array/reductions/mapreduce/Int64/dims=1 45021 ns 51105 ns 0.88
array/reductions/mapreduce/Int64/dims=2 61289 ns 61916 ns 0.99
array/reductions/mapreduce/Int64/dims=1L 89081 ns 89497 ns 1.00
array/reductions/mapreduce/Int64/dims=2L 87923.5 ns 88980 ns 0.99
array/reductions/mapreduce/Float32/1d 36873.5 ns 37944 ns 0.97
array/reductions/mapreduce/Float32/dims=1 44834 ns 52429 ns 0.86
array/reductions/mapreduce/Float32/dims=2 60114 ns 60388 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52540 ns 53049 ns 0.99
array/reductions/mapreduce/Float32/dims=2L 72469 ns 72843 ns 0.99
array/broadcast 19847 ns 20274 ns 0.98
array/copyto!/gpu_to_gpu 10996 ns 11225 ns 0.98
array/copyto!/cpu_to_gpu 214160 ns 218396.5 ns 0.98
array/copyto!/gpu_to_cpu 284329.5 ns 284648 ns 1.00
array/accumulate/Int64/1d 124290 ns 125449 ns 0.99
array/accumulate/Int64/dims=1 84400 ns 84251 ns 1.00
array/accumulate/Int64/dims=2 158340 ns 158690 ns 1.00
array/accumulate/Int64/dims=1L 1710794 ns 1709941.5 ns 1.00
array/accumulate/Int64/dims=2L 966620.5 ns 967026.5 ns 1.00
array/accumulate/Float32/1d 108697 ns 109856 ns 0.99
array/accumulate/Float32/dims=1 80376 ns 81373 ns 0.99
array/accumulate/Float32/dims=2 147865.5 ns 148536 ns 1.00
array/accumulate/Float32/dims=1L 1619116 ns 1619811 ns 1.00
array/accumulate/Float32/dims=2L 698567 ns 699285.5 ns 1.00
array/construct 1276.4 ns 1296.2 ns 0.98
array/random/randn/Float32 43461 ns 48633 ns 0.89
array/random/randn!/Float32 24860 ns 25237 ns 0.99
array/random/rand!/Int64 27194 ns 27465 ns 0.99
array/random/rand!/Float32 8822.333333333334 ns 8946 ns 0.99
array/random/rand/Int64 31068.5 ns 30454 ns 1.02
array/random/rand/Float32 13132 ns 13364.5 ns 0.98
array/permutedims/4d 55347.5 ns 55600 ns 1.00
array/permutedims/2d 53864 ns 54423 ns 0.99
array/permutedims/3d 54831 ns 55435 ns 0.99
array/sorting/1d 2758123 ns 2759622.5 ns 1.00
array/sorting/by 3344447.5 ns 3345835 ns 1.00
array/sorting/2d 1080757 ns 1082443 ns 1.00
cuda/synchronization/stream/auto 1058.1 ns 1033 ns 1.02
cuda/synchronization/stream/nonblocking 8113 ns 7095.4 ns 1.14
cuda/synchronization/stream/blocking 842.40625 ns 848.7717391304348 ns 0.99
cuda/synchronization/context/auto 1178.9 ns 1163.5 ns 1.01
cuda/synchronization/context/nonblocking 7050.4 ns 7702.2 ns 0.92
cuda/synchronization/context/blocking 895.5102040816327 ns 910.1 ns 0.98

This comment was automatically generated by workflow using github-action-benchmark.

@kshyatt kshyatt added the cuda array Stuff about CuArray. label Jan 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda array Stuff about CuArray.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory leak with unified memory?

2 participants