diff --git a/.gitignore b/.gitignore
index 89a9d0db..798ffe9c 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,2 +1,4 @@
# path that contains html generated by `mkdocs build`
site
+
+*.sw[nopq]
diff --git a/docs/alps/platforms.md b/docs/alps/platforms.md
index e62ce969..c2c7611c 100644
--- a/docs/alps/platforms.md
+++ b/docs/alps/platforms.md
@@ -7,17 +7,17 @@ A platform can consist of one or multiple [clusters][ref-alps-clusters], and its
-- :fontawesome-solid-mountain: __Machine Learning Platform__
+- :fontawesome-solid-mountain: __HPC Platform__
- The Machine Learning Platform (MLP) hosts ML and AI researchers.
+ The HPC Platform (HPCP) provides services for the HPC community in Switzerland and abroad. The majority of compute cycles are provided to the [User Lab](https://www.cscs.ch/user-lab/overview) via peer-reviewed allocation schemes.
- [:octicons-arrow-right-24: MLP][ref-platform-mlp]
+ [:octicons-arrow-right-24: HPCP][ref-platform-hpcp]
-- :fontawesome-solid-mountain: __HPC Platform__
+- :fontawesome-solid-mountain: __Machine Learning Platform__
- !!! todo
+ The Machine Learning Platform (MLP) hosts ML and AI researchers, particularly the SwissAI initiative.
- [:octicons-arrow-right-24: HPCP][ref-platform-hpcp]
+ [:octicons-arrow-right-24: MLP][ref-platform-mlp]
- :fontawesome-solid-mountain: __Climate and Weather Platform__
diff --git a/docs/clusters/daint.md b/docs/clusters/daint.md
index b61ab7a5..fce57233 100644
--- a/docs/clusters/daint.md
+++ b/docs/clusters/daint.md
@@ -1,2 +1,191 @@
[](){#ref-cluster-daint}
# Daint
+
+Daint is the main [HPC Platform][ref-platform-hpcp] cluster that provides compute nodes and file systems for GPU-enabled workloads.
+
+## Cluster specification
+
+### Compute nodes
+
+Daint consists of around 800-1000 [Grace-Hopper nodes][ref-alps-gh200-node].
+
+The number of nodes can vary as nodes are added or removed from other clusters on Alps.
+
+There are four login nodes, `daint-ln00[1-4]`.
+You will be assigned to one of the four login nodes when you ssh onto the system, from where you can edit files, compile applications and launch batch jobs.
+
+| node type | number of nodes | total CPU sockets | total GPUs |
+|-----------|-----------------| ----------------- | ---------- |
+| [gh200][ref-alps-gh200-node] | 1,022 | 4,088 | 4,088 |
+
+### Storage and file systems
+
+Daint uses the [HPCP filesystems and storage policies][ref-hpcp-storage].
+
+## Getting started
+
+### Logging into Daint
+
+To connect to Daint via SSH, first refer to the [ssh guide][ref-ssh].
+
+!!! example "`~/.ssh/config`"
+ Add the following to your [SSH configuration][ref-ssh-config] to enable you to directly connect to Daint using `ssh daint`.
+ ```
+ Host daint
+ HostName daint.alps.cscs.ch
+ ProxyJump ela
+ User cscsusername
+ IdentityFile ~/.ssh/cscs-key
+ IdentitiesOnly yes
+ ```
+
+### Software
+
+[](){#ref-cluster-daint-uenv}
+#### uenv
+
+Daint provides uenv to deliver programming environments and application software.
+Please refer to the [uenv documentation][ref-uenv] for detailed information on how to use the uenv tools on the system.
+
+
+
+- :fontawesome-solid-layer-group: __Scientific Applications__
+
+ Provide the latest versions of scientific applications, tuned for Daint, and the tools required to build your own versions of the applications.
+
+ * [CP2K][ref-uenv-cp2k]
+ * [GROMACS][ref-uenv-gromacs]
+ * [LAMMPS][ref-uenv-lammps]
+ * [NAMD][ref-uenv-namd]
+ * [Quantumespresso][ref-uenv-quantumespresso]
+ * [VASP][ref-uenv-vasp]
+
+
+
+
+
+- :fontawesome-solid-layer-group: __Programming Environments__
+
+ Provide compilers, MPI, Python, common libraries and tools used to build your own applications.
+
+ * [prgenv-gnu][ref-uenv-prgenv-gnu]
+ * [prgenv-nvfortran][ref-uenv-prgenv-nvfortran]
+ * [linalg][ref-uenv-linalg]
+ * [julia][ref-uenv-julia]
+
+
+
+
+- :fontawesome-solid-layer-group: __Tools__
+
+ Provide tools like
+
+ * [Linaro Forge][ref-uenv-linaro]
+
+
+[](){#ref-cluster-daint-containers}
+#### Containers
+
+Daint supports container workloads using the [container engine][ref-container-engine].
+
+To build images, see the [guide to building container images on Alps][ref-build-containers].
+
+#### Cray Modules
+
+!!! warning
+ The Cray Programming Environment (CPE), loaded using `module load cray`, is no longer supported by CSCS.
+
+ CSCS will continue to support and update uenv and container engine, and users are encouraged to update their workflows to use these methods at the first opportunity.
+
+ The CPE is still installed on Daint, however it will receive no support or updates, and will be replaced with a container in a future update.
+
+## Running jobs on Daint
+
+### Slurm
+
+Daint uses [Slurm][ref-slurm] as the workload manager, which is used to launch and monitor compute-intensive workloads.
+
+There are four [Slurm partitions][ref-slurm-partitions] on the system:
+
+* the `normal` partition is for all production workloads.
+* the `debug` partition can be used to access a small allocation for up to 30 minutes for debugging and testing purposes.
+* the `xfer` partition is for [internal data transfer][ref-data-xfer-internal].
+* the `low` partition is a low-priority partition, which may be enabled for specific projects at specific times.
+
+
+
+| name | nodes | max nodes per job | time limit |
+| -- | -- | -- | -- |
+| `normal` | unlim | - | 24 hours |
+| `debug` | 24 | 2 | 30 minutes |
+| `xfer` | 2 | 1 | 24 hours |
+| `low` | unlim | - | 24 hours |
+
+* nodes in the `normal` and `debug` (and `low`) partitions are not shared
+* nodes in the `xfer` partition can be shared
+
+See the Slurm documentation for instructions on how to run jobs on the [Grace-Hopper nodes][ref-slurm-gh200].
+
+### FirecREST
+
+Daint can also be accessed using [FirecREST][ref-firecrest] at the `https://api.cscs.ch/ml/firecrest/v2` API endpoint.
+
+!!! warning "The FirecREST v1 API is still available, but deprecated"
+
+## Maintenance and status
+
+### Scheduled maintenance
+
+!!! todo "move this to HPCP top level docs"
+ Wednesday mornings 8:00-12:00 CET are reserved for periodic updates, with services potentially unavailable during this time frame. If the batch queues must be drained (for redeployment of node images, rebooting of compute nodes, etc) then a Slurm reservation will be in place that will prevent jobs from running into the maintenance window.
+
+ Exceptional and non-disruptive updates may happen outside this time frame and will be announced to the users mailing list, and on the [CSCS status page](https://status.cscs.ch).
+
+### Change log
+
+!!! change "2025-05-21"
+ Minor enhancements to system configuration have been applied.
+ These changes should reduce the frequency of compute nodes being marked as `NOT_RESPONDING` by the workload manager, while we continue to investigate the issue
+
+!!! change "2025-05-14"
+ ??? note "Performance hotfix"
+ The [access-counter-based memory migration feature](https://developer.nvidia.com/blog/cuda-toolkit-12-4-enhances-support-for-nvidia-grace-hopper-and-confidential-computing/#access-counter-based_migration_for_nvidia_grace_hopper_memory) in the NVIDIA driver for Grace Hopper is disabled to address performance issues affecting NCCL-based workloads (e.g. LLM training)
+
+ ??? note "NVIDIA boost slider"
+ Added an option to enable the NVIDIA boost slider (vboost) via Slurm using the `-C nvidia_vboost_enabled` flag.
+ This feature, disabled by default, may increase GPU frequency and performance while staying within the power budget
+
+ ??? note "Enroot update"
+ The container runtime is upgraded from version 2.12.0 to 2.13.0. This update includes libfabric version 1.22.0 (previously 1.15.2.0), which has demonstrated improved performance during LLM checkpointing
+
+!!! change "2025-04-30"
+ ??? note "uenv is updated from v7.0.1 to v8.1.0"
+ * improved uenv view management
+ * automatic generation of default uenv repository the first time uenv is called
+ * configuration files
+ * bash completion
+ * relative paths can be used for referring to squashfs images
+ * support for `SLURM_UENV` and `SLURM_UENV_VIEW` environment variables (useful for using inside CI/CD pipelines)
+ * better error messages and small bug fixes
+
+ ??? note "Pyxis is upgraded from v24.5.0 to v24.5.3"
+ * Added image caching for Enroot
+ * Added support for environment variable expansion in EDFs
+ * Added support for relative paths expansion in EDFs
+ * Print a message about the experimental status of the --environment option when used outside of the srun command
+ * Merged small features and bug fixes from upstream Pyxis releases v0.16.0 to v0.20.0
+ * Internal changes: various bug fixes and refactoring
+
+??? change "2025-03-12"
+ 1. The number of compute nodes has been increased to 1018
+ 1. The restriction on the number of running jobs per project has been lifted.
+ 1. A "low" priority partition has been added, which allows some project types to consume up to 130% of the project's quarterly allocation
+ 1. We have increased the power cap for the GH module from 624 to 660 W. You might see increased application performance as a consequence
+ 1. Small changes in kernel tuning parameters
+
+### Known issues
+
+!!! todo
+ Most of these issues (see original [KB docs](https://confluence.cscs.ch/spaces/KB/pages/868811400/Daint.Alps#Daint.Alps-Knownissues)) should be consolidated in a location where they can be linked to by all clusters.
+
+ We have some "know issues" documented under [communication libraries][ref-communication-cray-mpich], however these might be a bit too disperse for centralised linking.
diff --git a/docs/clusters/eiger.md b/docs/clusters/eiger.md
index 05a1fc6e..58eab865 100644
--- a/docs/clusters/eiger.md
+++ b/docs/clusters/eiger.md
@@ -1,3 +1,195 @@
[](){#ref-cluster-eiger}
# Eiger
+Eiger is an Alps cluster that provides compute nodes and file systems designed to meet the needs of CPU-only workloads for the [HPC Platform][ref-platform-hpcp].
+
+!!! under-construction
+ This documentation is for `eiger.alps.cscs.ch` - an updated version of Eiger that will replace the existing `eiger.cscs.ch` cluster.
+ For help using the existing Eiger, see the [Eiger User Guide](https://confluence.cscs.ch/spaces/KB/pages/284426490/Alps+Eiger+User+Guide) on the legacy KB documentation site.
+
+ The target date for full deployment of the new Eiger is **July 1, 2025**.
+
+!!! change "Important changes"
+ The redeployment of `eiger.cscs.ch` as `eiger.alps.cscs.ch` introduces changes that may affect some users.
+
+ ### Breaking changes
+
+ !!! warning "Sarus is replaced with the Container Engine"
+ The Sarus container runtime is replaced with the [Container Engine][ref-container-engine].
+
+ If you are using Sarus to run containers on Eiger, you will have to [rebuild][ref-build-containers] and adapt your containers for the Container Engine.
+
+ !!! warning "Cray modules and EasyBuild are no longer supported"
+ The Cray Programming Environment (accessed via the `cray` module) is no longer supported by CSCS, along with software that CSCS provided using EasyBuild.
+
+ The same version of the Cray modules is still available, along with software that was installed using them, however they will not receive updates or support from CSCS.
+
+ You are strongly encouraged to start using [uenv][ref-cluster-eiger-uenv] to access supported applications and to rebuild your own applications.
+
+ * The versions of compilers, `cray-mpich`, Python and libraries in uenv are up to date.
+ * The scientific application uenv have up to date versions of the supported applications.
+
+ ### Unimplemented features
+
+ !!! under-construction "FirecREST is not yet available"
+ [FirecREST][ref-firecrest] has not been configured on `eiger.alps` - it is still running on the old Eiger.
+
+ **It will be deployed, and this documentation updated when it is.**
+
+ ### Minor changes
+
+ !!! change "Slurm is updated from version 23.02.6 to 24.05.4"
+
+## Cluster specification
+
+### Compute nodes
+
+!!! under-construction
+ During this Early Access phase, there are 19 compute nodes for you to test and port your workflows to the new Eiger deployment. There is one compute node in the `debug` partition and one in the `xfer` partition for internal data transfer. The remaining compute nodes will be moved from `eiger.cscs.ch` to `eiger.alps.cscs.ch` at a later date (provisionally, 1 July 2025).
+
+Eiger consists of 19 [AMD Epyc Rome][ref-alps-zen2-node] compute nodes.
+
+There is one login node, `eiger-ln010`.
+
+[//]: # (TODO: You will be assigned to one of the four login nodes when you ssh onto the system, from where you can edit files, compile applications and start simulation jobs.)
+
+| node type | number of nodes | total CPU sockets | total GPUs |
+|-----------|-----------------| ----------------- | ---------- |
+| [zen2][ref-alps-zen2-node] | 19 | 38 | - |
+
+### Storage and file systems
+
+Eiger uses the [HPCP filesystems and storage policies][ref-hpcp-storage].
+
+## Getting started
+
+### Logging into Eiger
+
+To connect to Eiger via SSH, first refer to the [ssh guide][ref-ssh].
+
+!!! example "`~/.ssh/config`"
+ Add the following to your [SSH configuration][ref-ssh-config] to enable you to directly connect to eiger using `ssh eiger.alps`.
+ ```
+ Host eiger.alps
+ HostName eiger.alps.cscs.ch
+ ProxyJump ela
+ User cscsusername
+ IdentityFile ~/.ssh/cscs-key
+ IdentitiesOnly yes
+ ```
+
+### Software
+
+[](){#ref-cluster-eiger-uenv}
+#### uenv
+
+CSCS and the user community provide [uenv][ref-uenv] software environments on Eiger.
+
+
+
+
+- :fontawesome-solid-layer-group: __Scientific Applications__
+
+ Provide the latest versions of scientific applications, tuned for Eiger, and the tools required to build your own version of the applications.
+
+ * [CP2K][ref-uenv-cp2k]
+ * [GROMACS][ref-uenv-gromacs]
+ * [LAMMPS][ref-uenv-lammps]
+ * [NAMD][ref-uenv-namd]
+ * [Quantumespresso][ref-uenv-quantumespresso]
+ * [VASP][ref-uenv-vasp]
+
+
+
+
+
+- :fontawesome-solid-layer-group: __Programming Environments__
+
+ Provide compilers, MPI, Python, common libraries and tools used to build your own applications.
+
+ * [prgenv-gnu][ref-uenv-prgenv-gnu]
+ * [linalg][ref-uenv-linalg]
+ * [julia][ref-uenv-julia]
+
+
+
+
+- :fontawesome-solid-layer-group: __Tools__
+
+ Provide tools like
+
+ * [Linaro Forge][ref-uenv-linaro]
+
+
+[](){#ref-cluster-eiger-containers}
+#### Containers
+
+Eiger supports container workloads using the [Container Engine][ref-container-engine].
+
+To build images, see the [guide to building container images on Alps][ref-build-containers].
+
+!!! warning "Sarus is not available"
+ A key change with the new Eiger deployment is that the Sarus container runtime is replaced with the [Container Engine][ref-container-engine].
+
+ If you are using Sarus to run containers on Eiger, you will have to rebuild and adapt your containers for the Container Engine.
+
+#### Cray Modules
+
+!!! warning
+ The Cray Programming Environment (CPE), loaded using `module load cray`, is no longer supported by CSCS.
+
+ CSCS will continue to support and update uenv and the Container Engine, and users are encouraged to update their workflows to use these methods at the first opportunity.
+
+ The CPE is deprecated and will be removed completely at a future date.
+
+## Running jobs on Eiger
+
+### Slurm
+
+Eiger uses [Slurm][ref-slurm] as the workload manager, which is used to launch and monitor workloads on compute nodes.
+
+There are four [Slurm partitions][ref-slurm-partitions] on the system:
+
+* the `normal` partition is for all production workloads.
+* the `debug` partition can be used to access a small allocation for up to 30 minutes for debugging and testing purposes.
+* the `xfer` partition is for [internal data transfer][ref-data-xfer-internal].
+* the `low` partition is a low-priority partition, which may be enabled for specific projects at specific times.
+
+| name | nodes | max nodes per job | time limit |
+| -- | -- | -- | -- |
+| `normal` | unlim | - | 24 hours |
+| `debug` | 32 | 1 | 30 minutes |
+| `xfer` | 2 | 1 | 24 hours |
+| `low` | unlim | - | 24 hours |
+
+* nodes in the `normal` and `debug` partitions are not shared
+* nodes in the `xfer` partition can be shared
+
+See the Slurm documentation for instructions on how to run jobs on the [AMD CPU nodes][ref-slurm-amdcpu].
+
+### FirecREST
+
+!!! under-construction "FirecREST is not yet available"
+ [FirecREST][ref-firecrest] has not been configured on `eiger.alps` - it is still running on the old Eiger.
+
+ **It will be deployed, and this documentation updated when it is.**
+
+## Maintenance and status
+
+### Scheduled maintenance
+
+Wednesday mornings 8:00-12:00 CET are reserved for periodic updates, with services potentially unavailable during this time frame. If the batch queues must be drained (for redeployment of node images, rebooting of compute nodes, etc) then a Slurm reservation will be in place that will prevent jobs from running into the maintenance window.
+
+Exceptional and non-disruptive updates may happen outside this time frame and will be announced to the users mailing list, and on the [CSCS status page](https://status.cscs.ch).
+
+### Change log
+
+!!! change "2025-06-02 Early access phase"
+ Early access phase is open
+
+??? change "2025-05-23 Creation of Eiger on Alps"
+ Eiger is deployed as a vServices-enalbed cluster
+
+### Known issues
+
+
diff --git a/docs/clusters/santis.md b/docs/clusters/santis.md
index b0366f0d..afbc8497 100644
--- a/docs/clusters/santis.md
+++ b/docs/clusters/santis.md
@@ -76,7 +76,7 @@ It is also possible to use HPC containers on Santis:
Santis uses [SLURM][ref-slurm] as the workload manager, which is used to launch and monitor distributed workloads, such as training runs.
-There are two slurm partitions on the system:
+There are two [SLURM partitions][ref-slurm-partitions] on the system:
* the `normal` partition is for all production workloads.
* the `debug` partition can be used to access a small allocation for up to 30 minutes for debugging and testing purposes.
@@ -93,20 +93,11 @@ There are two slurm partitions on the system:
See the SLURM documentation for instructions on how to run jobs on the [Grace-Hopper nodes][ref-slurm-gh200].
-??? example "how to check the number of nodes on the system"
- You can check the size of the system by running the following command in the terminal:
- ```console
- $ sinfo --format "| %20R | %10D | %10s | %10l | %10A |"
- | PARTITION | NODES | JOB_SIZE | TIMELIMIT | NODES(A/I) |
- | debug | 32 | 1-2 | 30:00 | 3/29 |
- | normal | 1266 | 1-infinite | 1-00:00:00 | 812/371 |
- | xfer | 2 | 1 | 1-00:00:00 | 1/1 |
- ```
- The last column shows the number of nodes that have been allocated in currently running jobs (`A`) and the number of jobs that are idle (`I`).
-
### FirecREST
-Santis can also be accessed using [FirecREST][ref-firecrest] at the `https://api.cscs.ch/ml/firecrest/v1` API endpoint.
+Santis can also be accessed using [FirecREST][ref-firecrest] at the `https://api.cscs.ch/ml/firecrest/v2` API endpoint.
+
+!!! warning "The FirecREST v1 API is still available, but deprecated"
## Maintenance and status
diff --git a/docs/platforms/cwp/index.md b/docs/platforms/cwp/index.md
index 64ea33b7..4db40e47 100644
--- a/docs/platforms/cwp/index.md
+++ b/docs/platforms/cwp/index.md
@@ -1,5 +1,5 @@
[](){#ref-platform-cwp}
-# Climate and weather platform
+# Climate and Weather Platform
The Climate and Weather Platform (CWP) provides compute, storage and support to the climate and weather modeling community in Switzerland.
@@ -9,7 +9,7 @@ The Climate and Weather Platform (CWP) provides compute, storage and support to
Project administrators (PIs and deputy PIs) of projects on the CWP can to invite users to join their project, before they can use the project's resources on Alps.
-This is currently performed using the [project management tool][ref-account-ump].
+This is currently performed using the [account and resource management tool][ref-account-ump].
Once invited to a project, you will receive an email, which you can need to create an account and configure [multi-factor authentication][ref-mfa] (MFA).
@@ -37,35 +37,30 @@ There are three main file systems mounted on the CWP system Santis.
### Home
-Every user has a home path (`$HOME`) mounted at `/users/$USER` on the [VAST][ref-alps-vast] filesystem.
+Every user has a [home][ref-storage-home] path (`$HOME`) mounted at `/users/$USER` on the [VAST][ref-alps-vast] filesystem.
The home directory has 50 GB of capacity, and is intended for configuration, small software packages and scripts.
### Scratch
The Scratch filesystem provides temporary storage for high-performance I/O for executing jobs.
-Use scratch to store datasets that will be accessed by jobs, and for job output.
-Scratch is per user - each user gets separate scratch path and quota.
-!!! info
- A quota of 150 TB and 1 million inodes (files and folders) is applied to your scratch path.
+See the [Scratch][ref-storage-scratch] documentation for more information.
- These are implemented as soft quotas: upon reaching either limit there is a grace period of 1 week before write access to `$SCRATCH` is blocked.
-
- You can check your quota at any time from Ela or one of the login nodes, using the [`quota` command][ref-storage-quota].
-
-!!! info
- The environment variable `SCRATCH=/capstor/scratch/cscs/$USER` is set automatically when you log into the system, and can be used as a shortcut to access scratch.
+The environment variable `SCRATCH=/capstor/scratch/cscs/$USER` is set automatically when you log into the system, and can be used as a shortcut to access scratch.
!!! warning "scratch cleanup policy"
Files that have not been accessed in 30 days are automatically deleted.
- **Scratch is not intended for permanent storage**: transfer files back to the capstor project storage after job runs.
+ **Scratch is not intended for permanent storage**: transfer files back to the [Store][ref-storage-store] after job runs.
+
+### Project Store
-### Project
+Project storage is backed up, with no cleaning policy, as intermediate storage space for datasets, shared code or configuration scripts that need to be accessed from different vClusters.
-Project storage is backed up, with no cleaning policy: it provides intermediate storage space for datasets, shared code or configuration scripts that need to be accessed from different vClusters.
-Project is per project - each project gets a project folder with project-specific quota.
+The environment variable `PROJECT` is set automatically when you log into the system, and can be used as a shortcut to access the Store path for your primary project.
-* hard limits on capacity and inodes prevent users from writing to project if the quota is reached - you can check quota and available space by running the [`quota`][ref-storage-quota] command on a login node or ela.
-* it is not recommended to write directly to the project path from jobs.
+Hard limits on capacity and inodes prevent users from writing to [Store][ref-storage-store] if the quota is reached.
+You can check quota and available space by running the [`quota`][ref-storage-quota] command on a login node or ela.
+!!! warning
+ It is not recommended to write directly to the `$PROJECT` path from jobs.
diff --git a/docs/platforms/hpcp/index.md b/docs/platforms/hpcp/index.md
index 98a9b733..5c1c445c 100644
--- a/docs/platforms/hpcp/index.md
+++ b/docs/platforms/hpcp/index.md
@@ -1,5 +1,68 @@
[](){#ref-platform-hpcp}
# HPC Platform
-!!! todo
- follow the template of the [MLp][ref-platform-mlp]
+The HPC Platform (HPCP) provides compute, storage, and related services for the HPC community in Switzerland and abroad. The majority of compute cycles are provided to the [User Lab](https://www.cscs.ch/user-lab/overview) via peer-reviewed allocation schemes.
+
+## Getting Started
+
+### Getting access
+
+Principal Investigators (PIs) and Deputy PIs can invite users to join their projects using the [account and resource management tool][ref-account-ump].
+
+Once invited to a project you will receive an email with information on how to create an account and configure [multi-factor authentication][ref-mfa] (MFA).
+
+## Systems
+
+
+- :fontawesome-solid-mountain: [__Daint__][ref-cluster-daint]
+
+ Daint is a large [Grace-Hopper][ref-alps-gh200-node] cluster for GPU-enabled workloads.
+
+
+
+- :fontawesome-solid-mountain: [__Eiger__][ref-cluster-eiger]
+
+ Eiger is an [AMD Epyc][ref-alps-zen2-node] cluster for CPU-only workloads.
+
+
+[](){#ref-hpcp-storage}
+## File systems and storage
+
+There are three main file systems mounted on the HPCP clusters.
+
+| type |mount | file system |
+| -- | -- | -- |
+| [Home][ref-storage-home] | /users/$USER | [VAST][ref-alps-vast] |
+| [Scratch][ref-storage-scratch] | `/capstor/scratch/cscs/$USER` | [Capstor][ref-alps-capstor] |
+| [Store][ref-storage-store] | `/capstor/store/cscs/
/` | [Capstor][ref-alps-capstor] |
+
+### Home
+
+Every user has a [home][ref-storage-home] path (`$HOME`) mounted at `/users/$USER` on the [VAST][ref-alps-vast] file system.
+Home directories have 50 GB of capacity and are intended for keeping configuration files, small software packages, and scripts.
+
+### Scratch
+
+The Scratch file system is a large, temporary storage system designed for high-performance I/O. It is not backed up.
+
+See the [Scratch][ref-storage-scratch] documentation for more information.
+
+The environment variable `$SCRATCH` points to `/capstor/scratch/cscs/$USER`, and can be used as a shortcut to access your scratch folder.
+
+!!! warning "scratch cleanup policy"
+ Files that have not been accessed in 30 days are automatically deleted.
+
+ **Scratch is not intended for permanent storage**: transfer files back to the [Store][ref-storage-store] after batch job completion.
+
+### Store
+
+The Store (or Project) file system is provided as a space to store datasets, code, or configuration scripts that can be accessed from different clusters. The file system is backed up and there is no automated deletion policy.
+
+The environment variable `$STORE` can be used as a shortcut to access the Store folder of your primary project.
+
+Hard limits on the amount of data and number of files (inodes) will prevent you from writing to [Store][ref-storage-store] if your quotas are exceeded.
+You can check how much data and inodes you are consuming -- and their respective quotas -- by running the [`quota`][ref-storage-quota] command on a login node.
+
+!!! warning
+ It is not recommended to write directly to the `$STORE` path from batch jobs.
+
diff --git a/docs/running/slurm.md b/docs/running/slurm.md
index 3245fdad..07841c81 100644
--- a/docs/running/slurm.md
+++ b/docs/running/slurm.md
@@ -16,6 +16,18 @@ At CSCS, SLURM is configured to accommodate the diverse range of node types avai
Each type of node has different resource constraints and capabilities, which SLURM takes into account when scheduling jobs. For example, CPU-only nodes may have configurations optimized for multi-threaded CPU workloads, while GPU nodes require additional parameters to allocate GPU resources efficiently. SLURM ensures that user jobs request and receive the appropriate resources while preventing conflicts or inefficient utilization.
+!!! example "How to check the partitions and number of nodes therein?"
+ You can check the size of the system by running the following command in the terminal:
+ ```console
+ $ sinfo --format "| %20R | %10D | %10s | %10l | %10A |"
+ | PARTITION | NODES | JOB_SIZE | TIMELIMIT | NODES(A/I) |
+ | debug | 32 | 1-2 | 30:00 | 3/29 |
+ | normal | 1266 | 1-infinite | 1-00:00:00 | 812/371 |
+ | xfer | 2 | 1 | 1-00:00:00 | 1/1 |
+ ```
+ The last column shows the number of nodes that have been allocated in currently running jobs (`A`) and the number of jobs that are idle (`I`).
+
+
[](){#ref-slurm-partition-debug}
### Debug partition
The SLURM `debug` partition is useful for quick turnaround workflows. The partition has a short maximum time (timelimit can be seen with `sinfo -p debug`), and a low number of maximum nodes (the `MaxNodes` can be seen with `scontrol show partition=debug`).
diff --git a/docs/storage/filesystems.md b/docs/storage/filesystems.md
index a61b3c7e..9768a940 100644
--- a/docs/storage/filesystems.md
+++ b/docs/storage/filesystems.md
@@ -57,7 +57,7 @@ The command reports both disk space and the number of files for each filesystem/
## Cleaning Policy and Data Retention
-
+[](){#ref-storage-scratch}
## Scratch
The scratch file system is designed for performance rather than reliability, as a fast workspace for temporary storage.
@@ -85,6 +85,7 @@ Keep also in mind that data on scratch are not backed up, therefore users are ad
!!! note
Do not use the `touch` command to prevent the cleaning policy from removing files, because this behaviour would deprive the community of a shared resource.
+[](){#ref-storage-home}
## Users
Users are not supposed to run jobs from this filesystem because of the low performance. In fact the emphasis on the `/users` filesystem is reliability over performance: all home directories are backed up with GPFS snapshots and no cleaning policy is applied.
@@ -97,6 +98,7 @@ Expiration
!!! warning
All data will be deleted 3 months after the closure of the user account without further warning.
+[](){#ref-storage-store}
## Store on Capstor
The `/capstor/store` mount point of the Lustre file system `capstor` is intended for high-performance per-project storage on Alps. The mount point is accessible from the User Access Nodes (UANs) of Alps vClusters.
diff --git a/mkdocs.yml b/mkdocs.yml
index 04503088..61a9262e 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -23,15 +23,15 @@ nav:
- 'Storage': alps/storage.md
- 'Machine Learning Platform':
- platforms/mlp/index.md
- - 'clariden': clusters/clariden.md
- - 'bristen': clusters/bristen.md
+ - 'Clariden': clusters/clariden.md
+ - 'Bristen': clusters/bristen.md
- 'HPC Platform':
- platforms/hpcp/index.md
- - 'daint': clusters/daint.md
- - 'eiger': clusters/eiger.md
+ - 'Daint': clusters/daint.md
+ - 'Eiger': clusters/eiger.md
- 'Climate and Weather Platform':
- platforms/cwp/index.md
- - 'santis': clusters/santis.md
+ - 'Santis': clusters/santis.md
- 'Accounts and Projects':
- accounts/index.md
- 'Account and Resources Management Tool': accounts/ump.md