Gracefully handle unevenly distributed disk space during P2P

Apologies if this has already been requested, or is clearly impossible for some reason. My Dask knowledge isn't super deep.

I know that OSErrors, which can occur due to a disk being full, are handled pretty gracefully when spilling:

https://github.com/dask/distributed/blob/81774d41cb2a0b4258b36b29f2448b27cf62c363/distributed/spill.py#L134-L137

However, I am frequently running into OSErrors during the shuffle operation, here:

https://github.com/dask/distributed/blob/81774d41cb2a0b4258b36b29f2448b27cf62c363/distributed/shuffle/_disk.py#L179-L180

It does not appear that these are handled well -- they are treated as if they were an error in the task itself and surfaced up to me, when really I would like the task to be rerun elsewhere, since this is a problem local to one worker. Even killing the worker in question and allowing Dask to recompute the necessary data is more graceful.

This is a frequent annoyance for me running large dataframe operations (dataframes with a few hundred million rows and ~15 string columns) on a cluster that has unpredictable disk capacity constraints (which is a separate issue, but I would not expect to bubble up like this).

I can provide more details, such as a stack trace, if this is unexpected/should already work -- but I don't see any signs in the code of this being a bug, more like a missing feature.

	except OSError:
	# Typically, this is a disk full error
	logger.error("Spill to disk failed; keeping data in memory", exc_info=True)
	raise HandledError()

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Gracefully handle unevenly distributed disk space during P2P #8433

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	with open(self.directory / str(id), mode="ab") as f:
	f.writelines(frames)

Uh oh!

Gracefully handle unevenly distributed disk space during P2P #8433

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions