Skip to content

Memory allocation and freeing differences between Windows and Linux #1466

Open
@Degcube

Description

@Degcube

We ran into a potential memory leak in our long-living application.
We have a custom data loader that loads differently sized datasets into RAM (CPU) repeatedly throughout the application's lifespan.
We noticed that a lot of memory is being allocated and never freed. Naturally, we suspected we might be leaking tensors and not disposing of them properly.

After further investigation, we found that everything works fine under Windows.
We then created a minimal example that simply allocates a torch.zeros tensor and disposes of it.

On Windows:
Even after hundreds of test runs within the same application instance, the memory is completely freed after each run. RAM usage remains consistent.

On Linux:
Allocating VRAM on the GPU works perfectly fine — the memory isn’t explicitly freed, but it’s reused, and since nothing else is running on the GPU, this isn’t a problem.
However, when using the CPU, the application allocates more and more RAM after each run. While the amount of additional memory allocated decreases over time, it still adds up.

After some digging, we suspected the memory allocator was to blame and switched to this configuration:
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4
TCMALLOC_RELEASE_RATE=10

Now, the behavior mimics what we see on the GPU: the RAM is allocated and reused, but not freed — even after all tensors are disposed.

We assume the allocator maintains an internal "cache" or similar mechanism that operates independently of the .NET memory manager. This causes issues because our system eventually needs the memory for other tasks, and the .NET GC cannot free whatever is allocated in the libtorch "cache."

Maybe our assumptions aren’t entirely correct, but here’s our main question:

How can we get our memory back? 😁
We want to avoid solutions where the TorchSharp stuff runs in a separate process and gets freed when the process exits, because this just adds unnecessary layers and complexity to the code..

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions