Skip to content

[PyTorch] Native memory leak in AbstractTensor: createBuffer() and createIndexer() methods #1721

@iwalsh

Description

@iwalsh

In org.bytedeco.pytorch.AbstractTensor, the methods createBuffer(long index) and createIndexer(boolean direct) are each creating objects of type org.bytedeco.pytorch.TensorOptions and org.bytedeco.pytorch.Device and never closing them. These types extend from org.bytedeco.javacpp.Pointer, so they allocate native memory and without them being closed this native memory is not freed in a timely / deterministic fashion, causing a leak.


Here is a partial heap dump of our production system showing the memory consumed by the leaksTensorOptions and Device:

1249940736  23.4%  23.4% 1249940736  23.4% Java_org_bytedeco_pytorch_TensorOptions_device__
1225691676  22.9%  46.3% 1225691676  22.9% Java_org_bytedeco_pytorch_TensorBase_options
1205899200  22.5%  68.8% 1205899200  22.5% Java_org_bytedeco_pytorch_TensorOptions_allocate__Lorg_bytedeco_pytorch_global_torch_00024ScalarType_2
655977372  12.3%  81.1% 655977372  12.3% os::malloc@ccb2e0
504454750   9.4%  90.5% 504762888   9.4% os::malloc@ccb520
155669273   2.9%  93.4% 155669273   2.9% inflate
111853568   2.1%  95.5% 111853568   2.1% c10::alloc_cpu
49284576   0.9%  96.4% 49284576   0.9% Java_org_bytedeco_pytorch_TensorBase_dtype

As a temporary fix, we have wrapped the call sites in org.bytedeco.javacpp.PointerScope so that the intermediate objects are closed when the methods complete:

// Patching the native memory leak from createBuffer()
final org.bytedeco.pytorch.Tensor tensor = ...;
final ByteBuffer dst = ByteBuffer.allocate((int) tensor.numel());
try (PointerScope scope = new PointerScope(TensorOptions.class, Device.class)) {
    dst.put(tensor.createBuffer());  // <-- Leaks TensorOptions and Device pointers
}


// Patching the native memory leak from create() (which calls createIndexer())
int[] data = ...;
long[] dimensions = ...;
final org.bytedeco.pytorch.Tensor tensor;
try (PointerScope scope = new PointerScope(TensorOptions.class, Device.class)) {
    tensor = AbstractTensor.create(data, dimensions);  // <-- Leaks TensorOptions and Device pointers
}

These methods should be updated to correctly close TensorOptions and Device before returning.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions