After the Infrastructure meeting on 2025-08-01, I'm under the impression that our filesystem won't handle gracefully the creation and writing to many small files.
What's a good recommendation for someone running code that generates many small files? Should they just write to ${SLURM_TMPDIR}, then tar and rsync the files afterwards?
What's a simple solution that doesn't make the code fragile when facing preemption?
The point of this issue that I'm creating here is that I don't see tips in our documentation about this. Whatever we recommend should also be applicable for the Mila cluster and for DRAC clusters (different solutions could apply for those two cases).