Skip to content

Organize the storage on the cluster #77

@gkaf89

Description

@gkaf89

Storage tiers

The storage is organized across multiple tiers. The distinguishing characteristics for the tiers are:

  • speed (throughput and latency),
  • size,
  • accessibility (temporal and locational persistency), and
  • robustness (redundancy and back-ups).

Usually

  • speed is inversely proportional to size, robustness, and accessibility, and
  • size, robustness, and accessibility are proportional to each other.

Only low speed storage (i.e. the Isilon NFS mount) will be accessible to all clusters in the future. Thus, Isilon will become crucial in the future in maintaining uniform data access across all clusters.

File systems accessible through the HPC Infiniband network

The HPC file systems are meant to store working data, and are not meant for long term storage. The scratch file system and project directories store large temporary input/output files, the home directory is meant for working storage, and then we have local file systems accessible through /tmp (local persistent memory) and /dev/shm (virtual memory) that are fast, available in jobs, and wiped out when the job finishes. Finally project storage is meant to store finalized input and output files.

However, there are file systems that are accessible through slower network connections and offer different kinds of features.

File systems not accessible through Infiniband

The central university storage is slower, but snapshotted and backed up much more regularly. Therefore users should transfer their data to the central systems for long tern storage.

However, there are multiple options of accessing the central university storage. There are the systems Atlas, Ebenezer, Isilon-DMZi, and Isilon-DMZe.

  • What is the difference between Atlas, Ebenezer, and Isilon?
  • What is the difference between Isilon-DMZi and Isilon-DMZe?
  • How are user quota managed in central storage systems, and how can users see the usage limits?

The Isilon file system

Isilon is actually the name of the technical solution: https://www.dell.com/fr-fr/dt/storage/isilon/isilon-h5600-hybrid-nas-storage.htm#scroll=off

There are 2 central storage servers to Hyacithe's knowledge, which are operated by the SIU, the "isilon-prod" and "isilon-drs" (off site replica of "isilon-prod", in case of disaster on "isilon-prod").

The isilon-prod is split in (at least) two zones:

  • the SIU zone, that accessed using SMB via atlas.uni.lux, and
  • the HPC zone, that is mounted in the clusters with NFS and can be accessed in /mnt/isilon.

For the HPC side, we are on;y interested about the NFS mounted file system. Documentation about Isilon: https://hpc-git.uni.lu/ulhpc/sysadmins/-/wikis/storage/isilon

  • The processes for the HPC zone are not well defined or documented. We can set up quota per project directory, but there's no way to show this information to the users. We are working on providing users with access to this information and setting up a policy for assigning quota.

  • We share the Isilon system with the SIU. There is a "fair use agreement" in place which allocated 2PB for the HPC zone, currently used at 88% of the full capacity. Maintaining access to the Isilon system is important moving forward, as the Isilon file system will be the only system unifying data access across our future clusters. We should participate in any future calls and coordinate with SIU.

  • In terms of performance, performance is abysmal with small random I/Os, for instance small files, metadata, etc. The Isilon NFS mount works well for administrative needs, like archiving and occasional data transfers, and even for big file I/O. But don't try to perform any compute driven operation on NFS mounted Isilon, like compile a software on it, or anything similar.

The Atlas file system

The SMB protocol allows for easy mounting of file systems on personal computers, including Windows machines.

The HPC team is not managing the file system exported through SMB from Atlas (atlas.uni.lux). However, the HPC team maintains the smb-storage script (under active development) that allows mounting SMB shares on the login nodes of our clusters.

Fun fact: you can access the HPC zone via samba on your workstation using your Active Directory credentials. This works via a fragile script to map windows/POSIX permissions and user accounts from the HPC-IPA to the SIU Active Directory. This was requested by LCSB Bio-core in 2014. The system still works but it is no longer supported. Honestly, if you are using linux you can get the performance of SMB with SSHFS: https://blog.ja-ke.tech/2019/08/27/nas-performance-sshfs-nfs-smb.html

Add some instruction on how to fix errors in access permissions

The discussion of data management is a bit unorganized. We should probably reorganize the sections and add some information on how users can fix their projects when errors occur.

To fix access permissions in a project directory,

  1. change ownership,
chown -R :<project name> /work/projects/<project name>
  1. and then change access rights:
find /work/projects/<project name> -type d | xargs chmod g=rxs

Also, add a link with more resources: https://www.redhat.com/sysadmin/suid-sgid-sticky-bit

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions