-
Notifications
You must be signed in to change notification settings - Fork 263
Take into account unified memory allocations for GC pressure #3014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/src/memory.jl b/src/memory.jl
index 78ed633c1..8181aca79 100644
--- a/src/memory.jl
+++ b/src/memory.jl
@@ -670,9 +670,9 @@ end
mem
end
@inline function _pool_alloc(::Type{UnifiedMemory}, sz)
- mem = alloc(UnifiedMemory, sz)
- account!(memory_stats(), sz)
- mem
+ mem = alloc(UnifiedMemory, sz)
+ account!(memory_stats(), sz)
+ return mem
end
@inline function _pool_alloc(::Type{HostMemory}, sz)
alloc(HostMemory, sz)
@@ -727,8 +727,8 @@ end
account!(memory_stats(mem.dev), -sizeof(mem))
end
@inline function _pool_free(mem::UnifiedMemory, stream::CuStream)
- account!(memory_stats(), -sizeof(mem))
- free(mem)
+ account!(memory_stats(), -sizeof(mem))
+ return free(mem)
end
@inline _pool_free(mem::HostMemory, stream::CuStream) = free(mem)
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3014 +/- ##
==========================================
+ Coverage 89.43% 89.46% +0.02%
==========================================
Files 148 148
Lines 12991 12995 +4
==========================================
+ Hits 11619 11626 +7
+ Misses 1372 1369 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDA.jl Benchmarks
Details
| Benchmark suite | Current: 5050835 | Previous: 0c00b83 | Ratio |
|---|---|---|---|
latency/precompile |
55440528447 ns |
55103441406.5 ns |
1.01 |
latency/ttfp |
7811786159.5 ns |
7854970810.5 ns |
0.99 |
latency/import |
4135630131 ns |
4142920660.5 ns |
1.00 |
integration/volumerhs |
9611393 ns |
9624895.5 ns |
1.00 |
integration/byval/slices=1 |
146970 ns |
147201 ns |
1.00 |
integration/byval/slices=3 |
425954 ns |
426000 ns |
1.00 |
integration/byval/reference |
145011 ns |
145105 ns |
1.00 |
integration/byval/slices=2 |
286518.5 ns |
286632 ns |
1.00 |
integration/cudadevrt |
103528 ns |
103846 ns |
1.00 |
kernel/indexing |
14146 ns |
14265 ns |
0.99 |
kernel/indexing_checked |
14722 ns |
14925 ns |
0.99 |
kernel/occupancy |
683.1148648648649 ns |
783.1441441441441 ns |
0.87 |
kernel/launch |
2226.5555555555557 ns |
2262.3333333333335 ns |
0.98 |
kernel/rand |
14753 ns |
16624 ns |
0.89 |
array/reverse/1d |
19859 ns |
20261 ns |
0.98 |
array/reverse/2dL_inplace |
66580 ns |
66981 ns |
0.99 |
array/reverse/1dL |
70083 ns |
70447 ns |
0.99 |
array/reverse/2d |
21937.5 ns |
22244 ns |
0.99 |
array/reverse/1d_inplace |
9633 ns |
11580 ns |
0.83 |
array/reverse/2d_inplace |
13446 ns |
13344 ns |
1.01 |
array/reverse/2dL |
73826 ns |
74284 ns |
0.99 |
array/reverse/1dL_inplace |
66890 ns |
67021 ns |
1.00 |
array/copy |
20366 ns |
20733 ns |
0.98 |
array/iteration/findall/int |
157825 ns |
159065 ns |
0.99 |
array/iteration/findall/bool |
139945.5 ns |
141350 ns |
0.99 |
array/iteration/findfirst/int |
160523.5 ns |
162741 ns |
0.99 |
array/iteration/findfirst/bool |
161829 ns |
164024 ns |
0.99 |
array/iteration/scalar |
71865.5 ns |
72819 ns |
0.99 |
array/iteration/logical |
214670.5 ns |
220064.5 ns |
0.98 |
array/iteration/findmin/1d |
90973 ns |
56834 ns |
1.60 |
array/iteration/findmin/2d |
120862 ns |
98602 ns |
1.23 |
array/reductions/reduce/Int64/1d |
43196 ns |
44369 ns |
0.97 |
array/reductions/reduce/Int64/dims=1 |
45261.5 ns |
46082 ns |
0.98 |
array/reductions/reduce/Int64/dims=2 |
61434.5 ns |
62261 ns |
0.99 |
array/reductions/reduce/Int64/dims=1L |
88991 ns |
89560 ns |
0.99 |
array/reductions/reduce/Int64/dims=2L |
87848.5 ns |
88709 ns |
0.99 |
array/reductions/reduce/Float32/1d |
37083 ns |
38170.5 ns |
0.97 |
array/reductions/reduce/Float32/dims=1 |
51660.5 ns |
43662 ns |
1.18 |
array/reductions/reduce/Float32/dims=2 |
59862 ns |
60196 ns |
0.99 |
array/reductions/reduce/Float32/dims=1L |
52305 ns |
52890 ns |
0.99 |
array/reductions/reduce/Float32/dims=2L |
71875 ns |
72931 ns |
0.99 |
array/reductions/mapreduce/Int64/1d |
43420 ns |
44486 ns |
0.98 |
array/reductions/mapreduce/Int64/dims=1 |
45021 ns |
51105 ns |
0.88 |
array/reductions/mapreduce/Int64/dims=2 |
61289 ns |
61916 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=1L |
89081 ns |
89497 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2L |
87923.5 ns |
88980 ns |
0.99 |
array/reductions/mapreduce/Float32/1d |
36873.5 ns |
37944 ns |
0.97 |
array/reductions/mapreduce/Float32/dims=1 |
44834 ns |
52429 ns |
0.86 |
array/reductions/mapreduce/Float32/dims=2 |
60114 ns |
60388 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1L |
52540 ns |
53049 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2L |
72469 ns |
72843 ns |
0.99 |
array/broadcast |
19847 ns |
20274 ns |
0.98 |
array/copyto!/gpu_to_gpu |
10996 ns |
11225 ns |
0.98 |
array/copyto!/cpu_to_gpu |
214160 ns |
218396.5 ns |
0.98 |
array/copyto!/gpu_to_cpu |
284329.5 ns |
284648 ns |
1.00 |
array/accumulate/Int64/1d |
124290 ns |
125449 ns |
0.99 |
array/accumulate/Int64/dims=1 |
84400 ns |
84251 ns |
1.00 |
array/accumulate/Int64/dims=2 |
158340 ns |
158690 ns |
1.00 |
array/accumulate/Int64/dims=1L |
1710794 ns |
1709941.5 ns |
1.00 |
array/accumulate/Int64/dims=2L |
966620.5 ns |
967026.5 ns |
1.00 |
array/accumulate/Float32/1d |
108697 ns |
109856 ns |
0.99 |
array/accumulate/Float32/dims=1 |
80376 ns |
81373 ns |
0.99 |
array/accumulate/Float32/dims=2 |
147865.5 ns |
148536 ns |
1.00 |
array/accumulate/Float32/dims=1L |
1619116 ns |
1619811 ns |
1.00 |
array/accumulate/Float32/dims=2L |
698567 ns |
699285.5 ns |
1.00 |
array/construct |
1276.4 ns |
1296.2 ns |
0.98 |
array/random/randn/Float32 |
43461 ns |
48633 ns |
0.89 |
array/random/randn!/Float32 |
24860 ns |
25237 ns |
0.99 |
array/random/rand!/Int64 |
27194 ns |
27465 ns |
0.99 |
array/random/rand!/Float32 |
8822.333333333334 ns |
8946 ns |
0.99 |
array/random/rand/Int64 |
31068.5 ns |
30454 ns |
1.02 |
array/random/rand/Float32 |
13132 ns |
13364.5 ns |
0.98 |
array/permutedims/4d |
55347.5 ns |
55600 ns |
1.00 |
array/permutedims/2d |
53864 ns |
54423 ns |
0.99 |
array/permutedims/3d |
54831 ns |
55435 ns |
0.99 |
array/sorting/1d |
2758123 ns |
2759622.5 ns |
1.00 |
array/sorting/by |
3344447.5 ns |
3345835 ns |
1.00 |
array/sorting/2d |
1080757 ns |
1082443 ns |
1.00 |
cuda/synchronization/stream/auto |
1058.1 ns |
1033 ns |
1.02 |
cuda/synchronization/stream/nonblocking |
8113 ns |
7095.4 ns |
1.14 |
cuda/synchronization/stream/blocking |
842.40625 ns |
848.7717391304348 ns |
0.99 |
cuda/synchronization/context/auto |
1178.9 ns |
1163.5 ns |
1.01 |
cuda/synchronization/context/nonblocking |
7050.4 ns |
7702.2 ns |
0.92 |
cuda/synchronization/context/blocking |
895.5102040816327 ns |
910.1 ns |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
Fixes #3013
Please take into consideration that I'm not actually sure about the correctness of this since I have no experience working with the internals of this lib even if it does seem to solve the problem, an LLM was used to produce it