Skip to content

Network glitch during scatter() may result in memory leak #6412

Open
@crusaderky

Description

@crusaderky

If scatter fails either between these two lines (direct=True)

_, who_has, nbytes = await scatter_to_workers(
nthreads, data2, report=False, rpc=self.rpc
)
await self.scheduler.update_data(
who_has=who_has, nbytes=nbytes, client=self.id
)

or these (direct=False)

keys, who_has, nbytes = await scatter_to_workers(
nthreads, data, rpc=self.rpc, report=False
)
self.update_data(who_has=who_has, nbytes=nbytes, client=client)

e.g. the network falls over after the data has reached the worker, but before the worker can respond OK on the RPC channel, then the scheduler will not know about the data and the worker will not inform it.
The data will just sit there consuming memory, unknown to the scheduler, and only a new scatter/compute of the same key on the same worker will fix the issue.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions