-
-
Notifications
You must be signed in to change notification settings - Fork 753
Description
In org.bytedeco.pytorch.AbstractTensor, the methods createBuffer(long index) and createIndexer(boolean direct) are each creating objects of type org.bytedeco.pytorch.TensorOptions and org.bytedeco.pytorch.Device and never closing them. These types extend from org.bytedeco.javacpp.Pointer, so they allocate native memory and without them being closed this native memory is not freed in a timely / deterministic fashion, causing a leak.
Here is a partial heap dump of our production system showing the memory consumed by the leaksTensorOptions and Device:
1249940736 23.4% 23.4% 1249940736 23.4% Java_org_bytedeco_pytorch_TensorOptions_device__
1225691676 22.9% 46.3% 1225691676 22.9% Java_org_bytedeco_pytorch_TensorBase_options
1205899200 22.5% 68.8% 1205899200 22.5% Java_org_bytedeco_pytorch_TensorOptions_allocate__Lorg_bytedeco_pytorch_global_torch_00024ScalarType_2
655977372 12.3% 81.1% 655977372 12.3% os::malloc@ccb2e0
504454750 9.4% 90.5% 504762888 9.4% os::malloc@ccb520
155669273 2.9% 93.4% 155669273 2.9% inflate
111853568 2.1% 95.5% 111853568 2.1% c10::alloc_cpu
49284576 0.9% 96.4% 49284576 0.9% Java_org_bytedeco_pytorch_TensorBase_dtype
As a temporary fix, we have wrapped the call sites in org.bytedeco.javacpp.PointerScope so that the intermediate objects are closed when the methods complete:
// Patching the native memory leak from createBuffer()
final org.bytedeco.pytorch.Tensor tensor = ...;
final ByteBuffer dst = ByteBuffer.allocate((int) tensor.numel());
try (PointerScope scope = new PointerScope(TensorOptions.class, Device.class)) {
dst.put(tensor.createBuffer()); // <-- Leaks TensorOptions and Device pointers
}
// Patching the native memory leak from create() (which calls createIndexer())
int[] data = ...;
long[] dimensions = ...;
final org.bytedeco.pytorch.Tensor tensor;
try (PointerScope scope = new PointerScope(TensorOptions.class, Device.class)) {
tensor = AbstractTensor.create(data, dimensions); // <-- Leaks TensorOptions and Device pointers
}These methods should be updated to correctly close TensorOptions and Device before returning.