(Potential) memory leak with uproot.iterate #1535
Replies: 9 comments
-
|
@pfackeldey - thanks for the detailed report! Indeed, it is a serious issue. I will have a look asap. Thanks! |
Beta Was this translation helpful? Give feedback.
-
|
Thanks @ianna, I'll try debugging as well meanwhile. |
Beta Was this translation helpful? Give feedback.
-
|
Good find @pfackeldey. I'm wondering, what is your interpretation for the fact that resident size flattens out after some iterations? It's like it "stops leaking" after a bit. |
Beta Was this translation helpful? Give feedback.
-
|
@ikrommyd I do not know why it looks like this |
Beta Was this translation helpful? Give feedback.
-
|
I checked 4 'versions' of
The results are as follows: TTree
TTree + explicit
|
Beta Was this translation helpful? Give feedback.
-
|
Ok, so I found apparently the main reason (and some solutions) to this issue. Apparently, we're suffering in uproot from memory fragmentation of arenas created by
The solutions to this are basically:
I didn't test 1-2 because I don't have a linux machine, but that should help. (Also this here should help for dask: https://distributed.dask.org/en/stable/worker-memory.html#memory-not-released-back-to-the-os) |
Beta Was this translation helpful? Give feedback.
-
|
Memory fragmentation has been a problem for us since ~forever, unfortunately. When we have many small long-lived allocations interspersed with very large array allocations, the arenas end up mostly empty. See this old coffea issue e.g. scikit-hep/coffea#249 |
Beta Was this translation helpful? Give feedback.
-
yes, that makes sense. We did quite some work in awkward to reduce small long-lived allocations, so that should hopefully be noticeable already. Do you think we could reuse some buffers explicitly in uproot @nsmith- (or do you have any other idea how that could be mitigated apart from "uproot-rs" :D)? This issue is not a programmatic memory leak then, should we turn this into a discussion instead for future reference? |
Beta Was this translation helpful? Give feedback.
-
|
For glibc malloc, the fragmentation can also be limited by setting Additionally, it may be that the fragmentation in uproot is exacerbated by the dynamic adjustment of this mmap threshold, which is supposedly enabled by default. |
Beta Was this translation helpful? Give feedback.





Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
(this is tested in uproot v5.6.5, likely present in other versions as well).
The issue & reproducer
When running the following snippet and benchmarking this with memray:
I'm getting a memory consumption of the physical RAM usage (RSS) of up to 1.6GB (even though the step size is 200 MB):
This is already surprising, however another indication that there's something up is that an explicit
gc.collectat the end of each iteration improves the RSS situation by ~2x, i.e.:which gives up to 800MB RSS consumption:
Why is this bad?
RSS is the physical RAM usage by this process, which dask monitors to decide if a worker should be killed due to OOM or not.
What I've found so far...
The memory usage grows by the following function: https://github.com/scikit-hep/uproot5/blob/main/src/uproot/behaviors/TBranch.py#L1440-L1452 and to be more explicit by this part of it: https://github.com/scikit-hep/uproot5/blob/main/src/uproot/behaviors/TBranch.py#L3421-L3428
What does work correctly is that the file
arraysdictionary by the above function is ~200 MB, that's good! However, this_ranges_or_baskets_to_arraysstill uses ~800 MB to fill the ~200 MBarraysdict and does not free it again.Also, the "popper-trick" that @jpivarski introduced in #1305 does enable the manual
gc.collectto help here (without it even that won't help).So, my understanding right now is that
uproot.iteratedoes yield correctly sized arrays, but it uses way to much memory while doing so and also doesn't free it properly.Other implications
_ranges_or_baskets_to_arraysis used also in other loading functions, and some quick tests showed that:have a similar memory behavior, see e.g. the profile for
loop_manual(the numerical values of the y axis is different of course because I can't exactly mirror "200 MB" steps by hand):and for
loop_same_chunks:What I want to see / was expecting
The orange and blue line overlap and roughly follow a saw tooth shape with 200 MB jumps per iteration (and not much additional overhead in RAM).
This has been originally been found by @oshadura in the scope of the integration challenge, here I just attach a local reproducer with some first findings.
cc @oshadura @alexander-held
Beta Was this translation helpful? Give feedback.
All reactions