Skip to content

[BUG] temp_dask folder sometimes is not founded and raises an error #8902

Open
@gargantuadev

Description

@gargantuadev

I cannot provide code since it is against my company policies.

I have small .parquet files, and I have tons of them. I read these ones with Dask. They are pretty small like 60KB. If I do the ".compute()" on a dask dataframe, it raises this error:

Traceback (most recent call last):
    df = df[[Key, "Index"]].reset_index(drop=True).compute() 
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\base.py", line 286, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\base.py", line 568, in compute
    results = schedule(dsk, keys, **kwargs)
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 560, in get_sync
    return get_async(
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 503, in get_async
    for key, res_info, failed in queue_get(queue).result():
  File "D:\Python38\lib\concurrent\futures\_base.py", line 437, in result
    return self.__get_result()
  File "D:\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
    raise self._exception
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 545, in submit
    fut.set_result(fn(*args, **kwargs))
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 237, in batch_execute_tasks
    return [execute_task(*a) for a in it]
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 237, in <listcomp>
    return [execute_task(*a) for a in it]
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 228, in execute_task
    result = pack_exception(e, dumps)
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 223, in execute_task
    result = _execute_task(task, data)
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\core.py", line 121, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\dataframe\shuffle.py", line 448, in __call__
    path = tempfile.mkdtemp(suffix=".partd", dir=self.tempdir)
  File "D:\Python38\lib\tempfile.py", line 358, in mkdtemp
    _os.mkdir(file, 0o700)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'E:/temp_dask/1729501515.144889\\tmpol0vhvzl.partd'

Anything else we need to know?: it happens when I have a lot of small files and that compute is done for each one of them.

Environment:

  • Dask version: 2021.7.0
  • Python version: 3.8.0
  • Operating System: Windows
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions