|
2 | 2 |
|
3 | 3 | ## Main goal
|
4 | 4 |
|
| 5 | +### Project |
| 6 | + |
5 | 7 | This partnership project involving CSC and Hewlett Packard Enterprise aims to enable HPC users to run secured jobs. It provides tools to enable anyone running secured jobs with encrypted data and specific confidential containers on a supercomputing site, leveraging (non exhaustively) :
|
6 | 8 | - [SPIFFE/SPIRE](https://github.com/spiffe/spire)
|
7 | 9 | - [Hashicorp Vault](https://github.com/hashicorp/vault)
|
8 | 10 | - [Singularity / Apptainer encryption](https://github.com/apptainer/apptainer)
|
9 | 11 | - [age encryption](https://github.com/FiloSottile/age)
|
10 | 12 |
|
| 13 | +### Architecture |
| 14 | + |
| 15 | +In-depth architecture documentation is available [here](https://github.com/CSCfi/HPCS/docs/architecture) |
| 16 | + |
11 | 17 | ## Demonstration
|
12 | 18 |
|
13 | 19 | [](https://asciinema.org/a/PWDzxlaVQmfGjbh30KKNTRO1f)
|
| 20 | + |
| 21 | +## Quickstart |
| 22 | + |
| 23 | +### Client |
| 24 | + |
| 25 | +Assuming that the HPCS server is setup. (See [Server](#Server)). |
| 26 | + |
| 27 | +To start using HPCS Client, the supported method has two requirements, docker and docker-compose. |
| 28 | + |
| 29 | +#### Pulling images |
| 30 | + |
| 31 | +Start by pulling the three images. |
| 32 | + |
| 33 | +```bash |
| 34 | +docker pull ghcr.io/cscfi/hpcs/[container/data/job]-prep:['branch-name'/latest] |
| 35 | +``` |
| 36 | + |
| 37 | +```bash |
| 38 | +docker pull ghcr.io/cscfi/hpcs/container-prep:latest |
| 39 | +docker pull ghcr.io/cscfi/hpcs/data-prep:latest |
| 40 | +docker pull ghcr.io/cscfi/hpcs/job-prep:latest |
| 41 | +``` |
| 42 | + |
| 43 | +#### Configuring the client |
| 44 | + |
| 45 | +Informations about Spire-Server, HPCS-Server and Vault depends on the server installation and the setup choices made on it's side. If you don't know those informations, please contact your HPCS-Server service provider. |
| 46 | + |
| 47 | +The client configuration is made of 4 main sections, in INI format. In depth description of the configuration files is available [here](https://github.com/CSCfi/HPCS/docs/configuration). |
| 48 | + |
| 49 | +Example of client configuration : |
| 50 | + |
| 51 | +```ini |
| 52 | +[spire-server] |
| 53 | +address = spire.lumi.csc.fi |
| 54 | +port = port |
| 55 | +trust-domain = spire-trust-domain |
| 56 | + |
| 57 | +[hpcs-server] |
| 58 | +url = https://hpcs-server:port |
| 59 | + |
| 60 | +[vault] |
| 61 | +url = https://vault-provider:port |
| 62 | + |
| 63 | +[supercomputer] |
| 64 | +address = lumi.csc.fi |
| 65 | +username = etellier |
| 66 | +``` |
| 67 | +Please replace `spire-server` section configuration with the informations relative to your Spire Server. |
| 68 | +You will also need to replace `hpcs-server` with the address of your HPCS server, eventually the `port` with the port on which HPCS server is exposed. |
| 69 | +The `vault` is the same as the `hpcs-server` section, please complete it with your vault informations. |
| 70 | +Finally, configure the supercomputer to use in the `supercomputer` section, specifying it's address under `address` and your `username` on the system. Your SSH Key needs to be setup. |
| 71 | + |
| 72 | +#### Prepare the runtime |
| 73 | + |
| 74 | +We recommand using docker-compose to run the process. Here is an example of a docker-compose file : |
| 75 | +It's composed of 3 sections, covering the three main steps of the process. |
| 76 | +- Prepare, encrypt and ship the `talinx/jp2a` image for the user `etellier` on node `nid003044` |
| 77 | +- Prepare, encrypt and ship the input data under `./workdir/jp2a_input` for the user `etellier` on node `nid003044` |
| 78 | +- Run the `jp2a` preapred image on the supercomputer `nid003044` node, specifying application args `/sd-container/encrypted/jp2a_input/image.png --width=100 --output=/sd-container/output/result` (the `jp2a_input` prepared dataset will be available under `/sd-container/encrypted/jp2a_input` at runtime). You also can specify your verifications scripts to run before and after the application here : `/pfs/lustrep4/scratch/project_462000031/etellier/verification_scripts` |
| 79 | + |
| 80 | +In depth documentation of the cli of the 3 jobs are available [here](https://github.com/CSCfi/HPCS/docs/cli). |
| 81 | + |
| 82 | +```yaml |
| 83 | +version: '2.4' |
| 84 | + |
| 85 | +services: |
| 86 | + container-prep: |
| 87 | + image: ghcr.io/cscfi/hpcs/container-prep:latest |
| 88 | + command: |
| 89 | + - "--config" |
| 90 | + - "/tmp/hpcs-client.conf" |
| 91 | + - "-b" |
| 92 | + - "talinx/jp2a" |
| 93 | + - "-s" |
| 94 | + - "/Users/telliere/secure_job/workdir" |
| 95 | + - "-e" |
| 96 | + - "-u" |
| 97 | + - "etellier" |
| 98 | + - "-c" |
| 99 | + - "nid003044" |
| 100 | + - "--data-path" |
| 101 | + - "/tmp/encrypted_prepared_jp2a.sif" |
| 102 | + - "--data-path-at-rest" |
| 103 | + - "/scratch/project_462000031/etellier/" |
| 104 | + - "--username" |
| 105 | + - "etellier" |
| 106 | + volumes: |
| 107 | + - /etc/group:/etc/group |
| 108 | + - /etc/passwd:/etc/passwd |
| 109 | + - /var/run/docker.sock:/var/run/docker.sock |
| 110 | + - $PWD/workdir:/tmp |
| 111 | + - $HOME/.ssh:/tmp/.ssh |
| 112 | + environment: |
| 113 | + - PWD=${PWD} |
| 114 | + - HOME=${HOME} |
| 115 | + user: "1001" # On Linux : Your uid (gathered using `id`). Remove on MacOS |
| 116 | + group_add: |
| 117 | + - "120" # The docker daemon group. Remove on MacOS |
| 118 | + |
| 119 | + |
| 120 | + data-prep: |
| 121 | + image: ghcr.io/cscfi/hpcs/data-prep:latest |
| 122 | + command: |
| 123 | + - "--config" |
| 124 | + - "/tmp/hpcs-client.conf" |
| 125 | + - "-i" |
| 126 | + - "/tmp/jp2a_input" |
| 127 | + - "-o" |
| 128 | + - "/tmp/" |
| 129 | + - "-u" |
| 130 | + - "etellier" |
| 131 | + - "-c" |
| 132 | + - "nid003044" |
| 133 | + - "--data-path" |
| 134 | + - "/tmp/encrypted_jp2a_input.tgz" |
| 135 | + - "--data-path-at-rest" |
| 136 | + - "/scratch/project_462000031/etellier/" |
| 137 | + - "--username" |
| 138 | + - "etellier" |
| 139 | + volumes: |
| 140 | + - /var/run/docker.sock:/var/run/docker.sock |
| 141 | + - $PWD/workdir:/tmp |
| 142 | + - $HOME/.ssh:/tmp/.ssh |
| 143 | + environment: |
| 144 | + - PWD=${PWD} |
| 145 | + - HOME=${HOME} |
| 146 | + |
| 147 | + job-prep: |
| 148 | + image: ghcr.io/cscfi/hpcs/job-prep:latest |
| 149 | + command: |
| 150 | + - "--config" |
| 151 | + - "/tmp/hpcs-client.conf" |
| 152 | + - "-N" |
| 153 | + - "1" |
| 154 | + - "-p" |
| 155 | + - "standard" |
| 156 | + - "-t" |
| 157 | + - "\"00:60:00\"" |
| 158 | + - "-A" |
| 159 | + - "project_462000031" |
| 160 | + - "--nodelist" |
| 161 | + - "nid003044" |
| 162 | + - "--workdir" |
| 163 | + - "/scratch/project_462000031/etellier" |
| 164 | + - "-ai" |
| 165 | + - "/scratch/project_462000031/etellier/encrypted_prepared_jp2a.sif.info.yaml" |
| 166 | + - "-di" |
| 167 | + - "/scratch/project_462000031/etellier/encrypted_jp2a_input.tgz.info.yaml" |
| 168 | + - "-args" |
| 169 | + - "\"\\\" /sd-container/encrypted/jp2a_input/image.png --width=100 --output=/sd-container/output/result \\\"\"" |
| 170 | + - "-i" |
| 171 | + - "/pfs/lustrep4/scratch/project_462000031/etellier/verification_scripts" |
| 172 | + - "-o" |
| 173 | + - "/pfs/lustrep4/scratch/project_462000031/etellier/verification_scripts" |
| 174 | + - "-flags" |
| 175 | + - "--env TERM=xterm-256color" |
| 176 | + - "-f" |
| 177 | + volumes: |
| 178 | + - $PWD/workdir:/tmp |
| 179 | + - $HOME/.ssh:/tmp/.ssh |
| 180 | + environment: |
| 181 | + - PWD=${PWD} |
| 182 | + - HOME=${HOME} |
| 183 | +``` |
| 184 | +
|
| 185 | +#### Run the preparations and the job |
| 186 | +
|
| 187 | +To run one of the containers : |
| 188 | +
|
| 189 | +```bash |
| 190 | +docker compose run --rm [data/container/job]-prep |
| 191 | +``` |
| 192 | + |
| 193 | +If you want to run the whole process by yourself : |
| 194 | + |
| 195 | +```bash |
| 196 | +docker compose run --rm data-prep |
| 197 | +docker compose run --rm container-prep |
| 198 | +docker compose run --rm job-prep |
| 199 | +``` |
| 200 | + |
| 201 | +An example demonstration is available [here](https://asciinema.org/a/PWDzxlaVQmfGjbh30KKNTRO1f). |
| 202 | + |
| 203 | +### Server |
| 204 | + |
| 205 | +## Limitations |
0 commit comments