Skip to content

Storage docs #121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 27 commits into from
Jun 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c68ae7e
add notes from call with pasmarco
bcumming Apr 23, 2025
ea85084
Merge branch 'main' into storage-refactor
bcumming Apr 25, 2025
5f97cd3
drafting the storage docs
bcumming Apr 28, 2025
07b1199
wip
bcumming May 22, 2025
e6dcf0d
Merge branch 'main' into storage-refactor
bcumming May 22, 2025
0792aa6
Merge branch 'main' into storage-refactor
bcumming May 23, 2025
a3445eb
wip
bcumming May 23, 2025
212967e
Merge branch 'main' into storage-refactor
bcumming May 23, 2025
d4790bc
sweep and mark remaining todo and under-construction sections in stor…
bcumming May 26, 2025
62e4681
wip
bcumming May 26, 2025
994bf38
spell check; add placeholders for FAQ docs
bcumming May 26, 2025
eab4402
Merge branch 'main' into storage-refactor
bcumming May 26, 2025
1680b23
fix broken link
bcumming May 26, 2025
2451b71
add marco p to codeowners
bcumming May 26, 2025
944a640
Merge branch 'main' into storage-refactor
bcumming May 26, 2025
f83fe00
document store layout
bcumming May 26, 2025
bb53f77
Merge branch 'storage-refactor' of github.com:bcumming/cscs-docs into…
bcumming May 26, 2025
0c60849
Update docs/alps/storage.md
twrobinson May 27, 2025
02a8c1c
Update docs/alps/storage.md
bcumming May 27, 2025
405f8c6
Update docs/alps/storage.md
bcumming May 27, 2025
8e8eff0
Update docs/storage/filesystems.md
bcumming May 27, 2025
bf02e58
@msimber review suggestions
bcumming May 27, 2025
8cebb2b
@RMeli's review
bcumming May 28, 2025
dc9ad3d
@afink review comments
bcumming May 28, 2025
0cd0941
wip
bcumming May 28, 2025
2e56dca
warn against touching files to avoid clean up
bcumming May 28, 2025
c25a4b2
merge main
bcumming May 28, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -7,3 +7,5 @@ docs/software/prgenv/linalg.md @finkandreas @msimberg
docs/software/sciapps/cp2k.md @abussy @RMeli
docs/software/sciapps/gromacs.md @kanduri
docs/software/ml @boeschf
docs/storage @mpasserini
docs/alps/storage.md @mpasserini
47 changes: 36 additions & 11 deletions docs/alps/storage.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
[](){#ref-alps-storage}
# Alps Storage

!!! under-construction

Alps has different storage attached, each with characteristics suited to different workloads and use cases.
HPC storage is managed in a separate cluster of nodes that host servers that manage the storage and the physical storage drives.
These separate clusters are on the same Slingshot 11 network as the Alps.
These separate storage clusters are on the same Slingshot 11 network as Alps.

| | Capstor | Iopsstor | Vast |
| | Capstor | Iopsstor | VAST |
|--------------|------------------------|------------------------|---------------------|
| Model | HPE ClusterStor E1000D | HPE ClusterStor E1000F | Vast |
| Model | HPE ClusterStor E1000D | HPE ClusterStor E1000F | VAST |
| Type | Lustre | Lustre | NFS |
| Capacity | 129 PB raw GridRAID | 7.2 PB raw RAID 10 | 1 PB |
| Number of Drives | 8,480 16 TB HDD | 240 * 30 TB NVMe SSD | N/A |
Expand All @@ -16,25 +18,48 @@ These separate clusters are on the same Slingshot 11 network as the Alps.
| IOPs | 1.5M | 8.6M read, 24M write | 200k read, 768k write |
| file create/s| 374k | 214k | 97k |


!!! todo
Information about Lustre. Meta data servers, etc.

* how many meta data servers on Capstor and Iopsstor
* how these are distributed between store/scratch

Also discuss how Capstor and iopstor are used to provide both scratch / store / other file systems

The mounts, and how they are used for Scratch, Store, and Home file systems that are mounted on clusters are documented in the [file system docs][ref-storage-fs].

[](){#ref-alps-capstor}
## capstor
## Capstor

Capstor is the largest file system, for storing large amounts of input and output data.
It is used to provide SCRATCH and STORE for different clusters - the precise details are platform-specific.
It is used to provide [scratch][ref-storage-scratch] and [store][ref-storage-store].

!!! todo "add information about meta data services, and their distribution over scratch and store"

[](){#ref-alps-capstor-scratch}
### Scratch

All users on Alps get their own scratch path on Alps, `/capstor/scratch/cscs/$USER`.

[](){#ref-alps-capstor-store}
### Store

The [Store][ref-storage-store] mount point on Capstor provides stable storage with [backups][ref-storage-backups] and no [cleaning policy][ref-storage-cleanup].
It is mounted on clusters at the `/capstor/store` mount point, with folders created for each project.

[](){#ref-alps-iopsstor}
## iopsstor
## Iopsstor

!!! todo
small text explaining what iopsstor is designed to be used for.
small text explaining what Iopsstor is designed to be used for.

[](){#ref-alps-vast}
## vast
## VAST

The Vast storage is smaller capacity system that is designed for use as home folders.
The VAST storage is smaller capacity system that is designed for use as [Home][ref-storage-home] folders.

!!! todo
small text explaining what iopsstor is designed to be used for.
small text explaining what Iopsstor is designed to be used for.

The mounts, and how they are used for SCRATCH, STORE, PROJECT, HOME would be in the [storage docs][ref-storage-fs]

2 changes: 1 addition & 1 deletion docs/services/jupyterlab.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ The service is accessed at [jupyter-daint.cscs.ch](https://jupyter-daint.cscs.c

Once logged in, you will be redirected to the JupyterHub Spawner Options form, where typical job configuration options can be selected in order to allocate resources. These options might include the type and number of compute nodes, the wall time limit, and your project account.

Single-node notebooks are launched in a dedicated queue, minimizing queueing time. For these notebooks, servers should be up and running within a few minutes. The maximum waiting time for a server to be running is 5 minutes, after which the job will be cancelled and you will be redirected back to the spawner options page. If your single-node server is not spawned within 5 minutes we encourage you to [contact us](ref-get-in-touch).
Single-node notebooks are launched in a dedicated queue, minimizing queueing time. For these notebooks, servers should be up and running within a few minutes. The maximum waiting time for a server to be running is 5 minutes, after which the job will be cancelled and you will be redirected back to the spawner options page. If your single-node server is not spawned within 5 minutes we encourage you to [contact us][ref-get-in-touch].

When resources are granted the page redirects to the JupyterLab session, where you can browse, open and execute notebooks on the compute nodes. A new notebook with a Python 3 kernel can be created with the menu `new` and then `Python 3` . Under `new` it is also possible to create new text files and folders, as well as to open a terminal session on the allocated compute node.

Expand Down
Loading