Fix for test_load_multiple_csv#2193
Fix for test_load_multiple_csv#2193brownbaerchen wants to merge 5 commits intohelmholtz-analytics:mainfrom
test_load_multiple_csv#2193Conversation
|
Hm. The CI on Helmholtz cloud has not been triggered because I merged main locally and then pushed rather than synching here. That's not ideal. |
mtar
left a comment
There was a problem hiding this comment.
Hi @brownbaerchen, the change may lead to other smaller issues.
tests/core/test_io.py
Outdated
| import pandas as pd | ||
|
|
||
| csv_path = os.path.join(os.getcwd(), "heat/datasets/csv_tests") | ||
| csv_path = ht.comm.bcast(tempfile.mkdtemp()) |
There was a problem hiding this comment.
This will create many temporary directories on multiprocess, but only the broadcasted one will be deleted. Furthermore, the default temporary directory is often leads to /tmp if not set otherwise. This can cause issues on HPC if more than one node is used and no environment variable is set.
There was a problem hiding this comment.
Alright, I am now creating the temporary directory only on rank 0 and within the current working directory of where the test is run from.
The test uses
os.mkdiron rank 0 to create a directory in persistent storage. This fails when the directory already exists. This issue is currently preventing me from running the tests locally in parallel since only rank 0 fails and the test results in a deadlock.This test runs only if the optional dependency
pandasis installed which is why I didn't run into this issue sooner and why maybe you are not affected.