You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/clusters/rcp/rcp.md
+37-38Lines changed: 37 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,7 @@
2
2
3
3
## 1. Pre-setup (access to scratch and cluster)
4
4
5
-
Please ask [Mark](mailto:mark.wagner@epfl.ch) or [Peter](mailto:peter.ahumuza@epfl.ch) to add you to the corresponding groups (send a message on Slack to get a faster answer). You can check your groups at https://groups.epfl.ch/
5
+
Please ask [Peter](mailto:peter.ahumuza@epfl.ch) to add you to the corresponding groups (send a message on Slack to get a faster answer). You can check your groups at https://groups.epfl.ch/
6
6
7
7
## 2. Setting-up credentials
8
8
@@ -65,7 +65,7 @@ Finally, we will execute an action that requires our identification on GitHub to
The RCP is organized into a [3 level hierarchy](https://wiki.rcp.epfl.ch/en/home/CaaS/FAQ/how-to-use-runai#access-hierarchy). The department is the laboratory (e.g. LIGHT or MLO). The projects determine which scratch (aka persistent storage) we have access to. Note that you should choose the SSO option when executing `runai login`.
126
+
The RCP is organized into a [3 level hierarchy](https://wiki.rcp.epfl.ch/en/home/CaaS/FAQ/how-to-use-runai#access-hierarchy). The department is the laboratory (e.g. LiGHT). The projects determine which scratch (aka persistent storage) we have access to. Note that you should choose the SSO option when executing `runai login`.
Time to test if we can submit a job! This command will allocate 1 GPU from the cluster and "sleep" to infinity (meaning that it will do essentially nothing)
140
+
Build your image following the [Docker tutorial](rcp_docker.md). Once you are done, it's time to test if we can submit a job! This command will allocate 1 GPU from the cluster and "sleep" to infinity (meaning that it will do essentially nothing)
> Note: If you have issue with the job not being launched (after doing a `describe`), ensure that there is such an image in [the registry](registry.rcp.epfl.ch). You can build your image following the docker tutorial.
163
+
> Note: If you have issue with the job not being launched (after doing a `describe`), ensure that there is such an image in [the registry](https://registry.rcp.epfl.ch). You can build your image following the docker tutorial.
164
+
> Note: It is heavily recommended to save this command into a `shell` file, to easily edit it and launch jobs with `bash connect.sh` for instance. You may have two files, one for CPU-only jobs and one for GPU (make sure to give them different names). Generally you don't need more than those two. **Important to know: jobs with GPUs are time limited, they are automatically shut down after 2 hours of using no GPUs, whereas there is no such limitation for CPU-only jobs**.
164
165
165
166
Explanation:
166
167
167
168
*`name` is the name of the job
168
169
*`image` is the link to the docker image that will be attached to the cluster. **Please note that you may need to change the image path if you pushed your image on another link. See [Building Docker image for the RCP](rcp_docker.md)**
169
-
*`pvc` determines which scratch will be mounted to the job. The argument is of the form: `name_of_the_scratch:/mount/path/to/scratch`. Here the we are mounting the scratch named `light-scratch` to the local path `/mloscratch`**This is part may cause an error because of the LIGHT migration**
170
-
*`gpu` is the number of GPU that you want to claim for this job (larger amount of GPU will be harder to get as ressources are limited)
171
-
172
-
> Note: It is heavily recommended to save this command into a `shell` file, to easily edit it and launch jobs with `bash connect.sh` for instance. You may have two files, one for CPU-only jobs and one for GPU (make sure to give them different names). Generally you don't need more than those two. Important to know: jobs with GPUs are time limited, they are automatically shut down after 2 hours of using no GPUs, whereas there is no such limitation for CPU-only jobs.
170
+
*`pvc` determines which scratch will be mounted to the job. The argument is of the form: `name_of_the_scratch:/mount/path/to/scratch`. Here the we are mounting the scratch named `light-scratch` to the local path `/lightscratch`**This part may cause an error because of the LIGHT migration**
171
+
*`gpu` is the number of GPUs that you want to claim for this job (larger amount of GPU will be harder to get, as ressources are limited)
173
172
174
173
We can check the outputs of our container and the status of the job using the following commands respectively.
175
174
```bash
176
175
# Your terminal
177
176
178
-
runai logs meditron-basic
179
-
runai describe job meditron-basic
177
+
runai logs base-job
178
+
runai describe job base-job
180
179
```
181
180
182
181
To end a job, run the command:
183
182
```bash
184
183
# Your terminal
185
184
186
-
runai delete job meditron-basic
185
+
runai delete job base-job
187
186
```
188
187
189
188
You can access your job by doing
190
189
```bash
191
190
# Your terminal
192
191
193
-
runai bash meditron-basic
192
+
runai bash base-job
194
193
```
195
194
196
-
You should see a terminal opening. By default it redirects you to the folder `/workspace`. This folder is not persistent, it is heavily recommended to always start using the job by moving to your personal folder you made earlier, at `/mloscratch/users/$GASPAR_USER`.
195
+
You should see a terminal opening. By default it redirects you to the folder `/workspace`. This folder is not persistent, **it is heavily recommended to always start using the job by moving to your personal folder you made earlier, at `/lightscratch/users/$GASPAR_USER`**.
197
196
198
-
Enter the following command in your new terminal to ensure that you have indeed a GPU:
197
+
You may enter the following command in your new terminal to ensure that you have indeed a GPU:
199
198
```bash
200
199
# Job terminal
201
200
@@ -206,7 +205,7 @@ Once you are done, run the following command to delete the job:
206
205
```bash
207
206
# Your terminal
208
207
209
-
runai delete job meditron-basic
208
+
runai delete job base-job
210
209
```
211
210
212
211
## 6. VSCode connection
@@ -218,7 +217,7 @@ Once we have the container running on a node of the RCP cluster, we can attach t
From the Kubernetes menu, we can see the IC and the RCP Cluster. We will enter the menu of the RCP Cluster -> Workloads -> Pods and we will see our container with a green indicator showing that it is running. Right-clicking on it will give us the option to "Attach to Visual Studio". Upon clicking, the editor will open in a new window within the container. We are then invited to open a folder, it should be our personal folder (`/mloscratch/users/$GASPAR`) by default, select it. When opening a new terminal, we should find ourselves directly in our personal folder, if needed we can move there with `cd` in the terminal. We can install new extensions on VS code, and they will be saved for future sessions.
220
+
From the Kubernetes menu, we can see the IC and the RCP Cluster. We will enter the menu of the RCP Cluster -> Workloads -> Pods and we will see our container with a green indicator showing that it is running. Right-clicking on it will give us the option to "Attach to Visual Studio". Upon clicking, the editor will open in a new window within the container. We are then invited to open a folder, it should be our personal folder (`/lightscratch/users/$GASPAR`) by default, select it. When opening a new terminal, we should find ourselves directly in our personal folder, if needed we can move there with `cd` in the terminal. We can install new extensions on VS code, and they will be saved for future sessions.
222
221
223
222
### Windows (WSL connection)
224
223
@@ -245,16 +244,16 @@ In **WSL**, claim a job and copy the kube configuration file from WSL to Windows
0 commit comments