Skip to content

propose decent solution when code generates lots of small files #285

@gyom

Description

@gyom

After the Infrastructure meeting on 2025-08-01, I'm under the impression that our filesystem won't handle gracefully the creation and writing to many small files.

What's a good recommendation for someone running code that generates many small files? Should they just write to ${SLURM_TMPDIR}, then tar and rsync the files afterwards?

What's a simple solution that doesn't make the code fragile when facing preemption?

The point of this issue that I'm creating here is that I don't see tips in our documentation about this. Whatever we recommend should also be applicable for the Mila cluster and for DRAC clusters (different solutions could apply for those two cases).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions