Skip to content

[BUG] Usage of Dictionaries will cause a "Memory Leak" in Multi-Node Environment #2448

Open
@MathiasBaumgartinger

Description

@MathiasBaumgartinger

Description

I am working on a SLURM multi-node architecture where each node has two GPUs. In such a multi-process environment, the usage of a dictionaries (i.e. samples provided by torchgeo-datamodules) will cause -- quote -- "copy-on-access problem of forked python processes". See pytorch/pytorch#13246 (comment) for more information. This refcounting problem leads to an ever increasing usage of memory, which will sooner or later cause the process to crash.

Steps to reproduce

Use any datamodule providing dictionaries as samples on multi-node environment

Version

0.6.0

Metadata

Metadata

Assignees

Labels

datasetsGeospatial or benchmark datasets

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions