Open
Description
I cannot provide code since it is against my company policies.
I have small .parquet files, and I have tons of them. I read these ones with Dask. They are pretty small like 60KB. If I do the ".compute()" on a dask dataframe, it raises this error:
Traceback (most recent call last):
df = df[[Key, "Index"]].reset_index(drop=True).compute()
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\base.py", line 286, in compute
(result,) = compute(self, traverse=False, **kwargs)
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\base.py", line 568, in compute
results = schedule(dsk, keys, **kwargs)
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 560, in get_sync
return get_async(
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 503, in get_async
for key, res_info, failed in queue_get(queue).result():
File "D:\Python38\lib\concurrent\futures\_base.py", line 437, in result
return self.__get_result()
File "D:\Python38\lib\concurrent\futures\_base.py", line 389, in __get_result
raise self._exception
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 545, in submit
fut.set_result(fn(*args, **kwargs))
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 237, in batch_execute_tasks
return [execute_task(*a) for a in it]
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 237, in <listcomp>
return [execute_task(*a) for a in it]
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 228, in execute_task
result = pack_exception(e, dumps)
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\local.py", line 223, in execute_task
result = _execute_task(task, data)
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\core.py", line 121, in _execute_task
return func(*(_execute_task(a, cache) for a in args))
File "C:\GENERIC_TOOL_NAME\.venv\lib\site-packages\dask\dataframe\shuffle.py", line 448, in __call__
path = tempfile.mkdtemp(suffix=".partd", dir=self.tempdir)
File "D:\Python38\lib\tempfile.py", line 358, in mkdtemp
_os.mkdir(file, 0o700)
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'E:/temp_dask/1729501515.144889\\tmpol0vhvzl.partd'
Anything else we need to know?: it happens when I have a lot of small files and that compute is done for each one of them.
Environment:
- Dask version: 2021.7.0
- Python version: 3.8.0
- Operating System: Windows
- Install method (conda, pip, source): pip