Skip to content

Commit 5be9d8b

Browse files
committed
job preparation sequence diagram + some fixs
1 parent f06f4da commit 5be9d8b

5 files changed

+97
-11
lines changed

.pre-commit-config.yaml

+1-2
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ repos:
33
- repo: https://github.com/pre-commit/pre-commit-hooks
44
rev: v4.5.0
55
hooks:
6-
- id: trailing-whitespace
76
- id: end-of-file-fixer
87
- id: check-yaml
98
- id: check-added-large-files
@@ -31,4 +30,4 @@ repos:
3130
rev: v4.0.0-alpha.8
3231
hooks:
3332
- id: prettier
34-
files: \.(js|ts|jsx|tsx|css|less|html|json|markdown|md|yaml|yml)$
33+
files: \.(js|ts|jsx|tsx|css|less|html|json|markdown|yaml|yml)$

docs/architecture.md docs/architecture/architecture.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -135,8 +135,8 @@ CN4SBATCH <--"HTTPS"--> Vault
135135
HPCSCDPB <--"SSH (As user - Data & Info files)"--> LN
136136
HPCSCCPB <--"SSH (As user - Container image & Info files)"--> LN
137137
138-
HPCSCJPB --"SSH (As user - SBATCH file & CLI Call to SBATCH)"--> LN
138+
HPCSCJPB --"SSH (As user - SBATCH file & CLI Call to SBATCH)"--> LN
139139
LN --"SSH (As user - Info files)"--> HPCSCJPB
140140
```
141141

142-
This diagram doesn't show the HTTPS requests from client/compute node to HPCS Server used to register the agents since this behaviour is a practical workaround. See section "Limitations" in [HPCS/README.md](https://github.com/CSCfi/HPCS/blob/main/README.md#limitations) for more information.
142+
This diagram doesn't show the HTTPS requests from client/compute node to HPCS Server used to register the agents since this behaviour is a practical workaround.

docs/architecture/container_preparation.md

+10-5
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This step consist in using an original OCI image to prepare it, encrypt it and s
66

77
```mermaid
88
sequenceDiagram
9+
actor User
910
User -->> Container Preparation container: spawns using docker-compose
1011
Container Preparation container -->> Spire Agent: spawns using `spawn_agent.py`
1112
Spire Agent ->> Spire Server: Runs node attestation
@@ -29,22 +30,24 @@ sequenceDiagram
2930
HPCS Server ->> Container Preparation container: SpiffeID & role to access the container, path to the secret
3031
Container Preparation container ->> Container Preparation container: Parse info file based on previous steps
3132
Container Preparation container ->> Supercomputer: Ship encrypted container
32-
Supercomputer ->> Container Preparation container:
33+
Supercomputer ->> Container Preparation container: '
3334
Container Preparation container ->> Supercomputer: Ship info file
3435
Supercomputer ->> Container Preparation container:
3536
Container Preparation container -->> Spire Agent: Kills
3637
Spire Agent -->> Container Preparation container:
37-
Spire Agent -->> Container Preparation container: Dies
38+
Spire Agent -->> Container Preparation container: Dies
3839
Container Preparation container -->> User: Finishes
3940
```
4041

41-
4242
## Sequence diagram of the container's preparation (without shipping)
4343

4444
### Image is prepared and then encrypted (Encryption at rest)
45+
4546
This step is currently (3/2024) used to encrypt the container. It does not require changes on LUMI to work.
47+
4648
```mermaid
4749
sequenceDiagram
50+
actor User
4851
User -->>HPCS Client: spawns using `python3 prepare_container.py [OPTIONS]`
4952
HPCS Client -->> Docker Client: spawns
5053
HPCS Client ->> HPCS Client: Create prepared Dockerfile
@@ -59,11 +62,13 @@ sequenceDiagram
5962
HPCS Client ->> HPCS Client: Encrypt image file
6063
```
6164

62-
6365
### Image is prepared and SIF encrypted
66+
6467
When HPC nodes support encrypted containers, this process can be used.
68+
6569
```mermaid
6670
sequenceDiagram
71+
actor User
6772
User -->>HPCS Client: spawns using `python3 prepare_container.py [OPTIONS]`
6873
HPCS Client -->> Docker Client: spawns
6974
HPCS Client ->> HPCS Client: Create prepared Dockerfile
@@ -75,4 +80,4 @@ sequenceDiagram
7580
Docker Client -->> Build-Env: Spawns
7681
Build-Env ->> Build-Env: Build final prepared and encrypted SIF image
7782
Build-Env ->> HPCS Client: Returns final prepared and encrypted SIF image
78-
```
83+
```

docs/architecture/data_preparation.md

+3-2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ This step consists in using an input directory, encrypt it and ship it to the su
66

77
```mermaid
88
sequenceDiagram
9+
actor User
910
User -->> Data Preparation container: spawns using docker-compose
1011
Data Preparation container -->> Spire Agent: spawns using `spawn_agent.py`
1112
Spire Agent ->> Spire Server: Runs node attestation
@@ -34,6 +35,6 @@ sequenceDiagram
3435
Supercomputer ->> Data Preparation container:
3536
Data Preparation container -->> Spire Agent: Kills
3637
Spire Agent -->> Data Preparation container:
37-
Spire Agent -->> Data Preparation container: Dies
38+
Spire Agent -->> Data Preparation container: Dies
3839
Data Preparation container -->> User: Finishes
39-
```
40+
```

docs/architecture/job_preparation.md

+81
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# Job preparation
2+
3+
This step consists in the preparation of the secure job, followed by its execution. It requires two info files (one for the data, one for the secured container) and more settings about the runtime (arguments, parameters for the singularity container ...).
4+
5+
## Sequence diagram of this step
6+
7+
```mermaid
8+
sequenceDiagram
9+
actor User
10+
participant Job Preparation container
11+
participant Login Node
12+
participant Scheduler
13+
14+
User -->> Job Preparation container: spawns using docker-compose
15+
Job Preparation container ->> Login Node: Initiate SSH Connection
16+
rect rgb(191, 223, 255)
17+
note right of User: Job preparation
18+
Job Preparation container ->> Login Node: SCP Data's info file
19+
Login Node ->> Job Preparation container: Info file
20+
Job Preparation container ->> Job Preparation container: Parse info from info file
21+
Job Preparation container ->> Login Node: SCP Container image's info file
22+
Login Node ->> Job Preparation container: Info file
23+
Job Preparation container ->> Job Preparation container: Parse info from info file
24+
Job Preparation container ->> Job Preparation container: Generate SBATCH file from template based on info gathered
25+
Job Preparation container ->> Login Node: Copy SBATCH File and HPCS Configuration file
26+
Login Node ->> Job Preparation container:
27+
Job Preparation container ->> Job Preparation container: Generate keypair for output data
28+
Job Preparation container ->> Login Node: Copy encryption key
29+
Login Node ->> Job Preparation container:
30+
end
31+
32+
rect rgb(191, 223, 255)
33+
note right of User: Job runtime
34+
Job Preparation container ->> Login Node: SSH Execute "sbatch SBATCHFILE"
35+
Login Node ->>+ Scheduler: sbatch SBATCHFILE
36+
Scheduler ->> Login Node: Job created + Job id
37+
Login Node ->> Job Preparation container: Job created + Job id
38+
Job Preparation container ->> Job Preparation container: Follows job output or job status
39+
activate Job Preparation container
40+
Scheduler ->> Scheduler: Scheduling job
41+
activate Scheduler
42+
deactivate Scheduler
43+
Scheduler ->> Compute node: Elect node - Execute SBATCHFILE
44+
Compute node ->> Compute node: Clone HPCS Github / Download age and gocryptfs binaries
45+
Compute node -->> Spire Agent: spawns using `spawn_agent.py`
46+
Spire Agent ->> Spire Server: Runs node attestation
47+
Spire Server ->> Spire Agent: Attests node, provide SVIDs for linked identities
48+
Compute node ->> Spire Agent: Fetches API to get an SVID
49+
Spire Agent ->> Compute node: Provides SVID
50+
Compute node ->> Vault: Log-in using SVID
51+
Vault ->> Compute node: Returns an authentication token (read only on container key's path)
52+
Compute node ->> Vault: Read container's key using authentication token
53+
Vault ->> Compute node: Returns container's key
54+
Compute node ->> Compute node: Decrypt container image
55+
Compute node ->> Compute node: Setup secure environment for runtime (Encrypted volumes, gather flags etc)
56+
Compute node ->> Spire Agent: Fetches API to get an SVID
57+
Spire Agent ->> Compute node: Provides SVID
58+
Compute node ->> Compute node: Export SVID and data secret path in a variable
59+
Compute node -->> Application container: spawns using `singularity run`
60+
Application container ->> Vault: Log-in using SVID
61+
Vault ->> Application container: Returns an authentication token (read only on data key's path)
62+
Application container ->> Vault: Read data's key using authentication token
63+
Vault ->> Application container: Returns data's key
64+
Application container ->> Application container: Decrypt data using key
65+
Application container ->> Application container: Runs input scripts
66+
Application container ->> Application container: Application runs
67+
Application container ->> Application container: Runs output scripts
68+
Application container ->> Application container: Encrypt output directory
69+
Application container -->> Compute node: Finishes
70+
Compute node -->> Spire Agent: Kills
71+
Spire Agent -->> Compute node:
72+
Spire Agent -->> Compute node: Dies
73+
Compute node ->> Scheduler: Becomes available
74+
deactivate Job Preparation container
75+
end
76+
Job Preparation container ->> Login Node: Close SSH connection
77+
Login Node ->> Job Preparation container:
78+
Login Node ->> Job Preparation container: Close SSH connection
79+
80+
Job Preparation container -->> User: Finishes
81+
```

0 commit comments

Comments
 (0)