|
5 | 5 | Templates for creating a Docker model submission on Synapse. |
6 | 6 | </h3> |
7 | 7 |
|
8 | | -You can either build off of this repository template or use it as reference |
9 | | -to build your model from scratch. We have provided a sample model template |
10 | | -for both R and Python. |
| 8 | +You can either build off of this repository template or use it as reference |
| 9 | +to build your model from scratch. Sample model templates for both R and |
| 10 | +Python are provided. |
11 | 11 |
|
12 | 12 | ### Requirements |
13 | | -* Python or R |
14 | | -* [Docker](https://docs.docker.com/get-docker/) |
15 | | -* [Synapse account](https://www.synapse.org/#) |
16 | | -* Synapse project for the challenge |
| 13 | + |
| 14 | +- Python or R |
| 15 | +- [Docker](https://docs.docker.com/get-docker/) |
| 16 | +- [Synapse account](https://www.synapse.org/#) |
| 17 | +- Synapse project for the challenge |
17 | 18 |
|
18 | 19 | --- |
19 | 20 |
|
20 | 21 | ### Write your algorithm(s) |
21 | 22 |
|
22 | | -1. Replace the code in the `run_model.*` script with your own algorithm(s). |
23 | | - You can create additional scripts for modularization/better organization |
24 | | - if desired. |
| 23 | +1. Replace the placeholder code in `run_model.*` script with your own |
| 24 | + algorithm(s). Create additional functions and scripts for modularity |
| 25 | + and organization as needed. |
| 26 | + |
| 27 | +2. Manage the dependencies: |
| 28 | + |
| 29 | + - **Python:** Update `requirements.txt` with any required Python libraries. |
| 30 | + - **R:** Update `requirements.R` by modifying the `pkg_list` variable to |
| 31 | + include or exclude necessary R packages. |
| 32 | + |
| 33 | +3. (optional) Locally run `run_model.*` to verify it can run successfully. |
| 34 | + |
| 35 | + The scripts have been designed to accept input and output directories as |
| 36 | + command-line arguments, allowing you to test with various data locations. |
25 | 37 |
|
26 | | -2. If using Python, update `requirements.txt` with any additional |
27 | | - libraries/packages used by your script(s). |
| 38 | + - By default, the scripts look for input in the `/input` directory |
| 39 | + and write output to the `/output` directory, as expected by the |
| 40 | + Synapse submission system. |
28 | 41 |
|
29 | | - If using R, update `requirements.R` and add/remove any libraries/packages |
30 | | - listed in `pkg_list` that are used by your script(s). |
| 42 | + - To use custom directories, specify them as arguments. For example: |
31 | 43 |
|
32 | | -3. (optional) Locally run `run_model.*` to ensure it can run successfully. |
| 44 | + **Python** |
33 | 45 |
|
34 | | - These scripts have been written so that the input and output files are not |
35 | | - hard-coded in the `/input` and `/output` directories, respectively (though |
36 | | - they are used by default). This way, you can test your changes using any |
37 | | - directories as input and/or output. |
| 46 | + ``` |
| 47 | + python python/run_model.py --input-dir sample_data/ --output-dir . |
| 48 | + ``` |
38 | 49 |
|
39 | | - For example, the following indicates that the input files are in |
40 | | - `sample_data/`, while the output file should be written to the current |
41 | | - working directory (`.`): |
| 50 | + **R** |
42 | 51 |
|
43 | | - **Python** |
44 | | - ``` |
45 | | - python run_model.py --input-dir ../sample_data/ --output-dir . |
46 | | - ``` |
| 52 | + ``` |
| 53 | + Rscript r/run_model.R --input-dir sample_data/ --output-dir . |
| 54 | + ``` |
47 | 55 |
|
48 | | - **R** |
49 | | - ``` |
50 | | - Rscript run_model.R --input-dir ../sample_data/ --output-dir . |
51 | | - ``` |
| 56 | + where: |
| 57 | +
|
| 58 | + - `sample_data/` is used as the input directory |
| 59 | + - `.` (current working directory) is used as the output directory |
52 | 60 |
|
53 | 61 | ### Update the Dockerfile |
54 | 62 |
|
55 | | -* Again, make sure that all needed libraries/packages are specified in the |
56 | | -`requirements.*` file. Because all Docker submissions are run without network |
57 | | -access, you will not able to install anything during the container run. If |
58 | | -you do not want to use a `requirements.*` file, you may run replace the RUN |
59 | | -command with the following: |
60 | | -
|
61 | | - **Python** |
62 | | - ``` |
63 | | - RUN pip install pandas |
64 | | - ``` |
65 | | -
|
66 | | - **R** |
67 | | - ``` |
68 | | - RUN R -e "install.packages(c('optparse'), repos = 'http://cran.us.r-project.org')" |
69 | | - ``` |
70 | | -
|
71 | | -* `COPY` over any additional files required by your model. We recommend using |
72 | | -one `COPY` command per file, as this can help speed up build time. |
73 | | -
|
74 | | -* Feel free to update the base image and/or tag version if the provided base |
75 | | -image do not fulfill your needs. Although you can use any valid image as the |
76 | | -base, we recommend using one of the [Trusted Content images], especially if |
77 | | -you are new to Docker. Images to consider: |
78 | | - * ubuntu |
79 | | - * python |
80 | | - * bitnami/pytorch |
81 | | - * r-base |
82 | | - * rocker/tidyverse |
83 | | -
|
84 | | -* If your image takes some time to build, look at the order of your Dockerfile |
85 | | -commands -- **the order matters**. To best take advantage of Docker's |
86 | | -build-caching (that is, reusing previously built layers), it's often a good |
87 | | -idea to put frequently-changing parts (such as `run_model.*`) near the end |
88 | | -of the Dockerfile. The way build-caching works is that once a step needs to |
89 | | -be rebuilt, all of the subsequent steps will also be rebuilt. |
| 63 | +- Ensure all dependencies are listed in `requirements.*` so that they are |
| 64 | + installed during this build process, as network access is disabled when |
| 65 | + your submission is run. |
| 66 | +
|
| 67 | +- Use `COPY` to add any files required by your model. We recommend using |
| 68 | + one `COPY` command per file for optimized build caching. |
| 69 | +
|
| 70 | +- Update the base image and/or tag version if the provided base do not |
| 71 | + fulfill your needs. Although you may use any valid image as the base, |
| 72 | + we recommend using one of the [Trusted Content images] for security and |
| 73 | + reliability, such as: |
| 74 | +
|
| 75 | + * `ubuntu` |
| 76 | + * `python` |
| 77 | + * `bitnami/pytorch` |
| 78 | + * `r-base` |
| 79 | + * `rocker/tidyverse` |
| 80 | +
|
| 81 | +- If your image is taking some time to build, consider optimizing the order |
| 82 | + of the Dockerfile commands by placing frequently changing parts near the |
| 83 | + end. This will take advantage of Docker's build caching. |
90 | 84 |
|
91 | 85 | > [Learn more about Docker's build cache]. |
92 | 86 |
|
93 | 87 | ### Build your model |
94 | 88 |
|
95 | | -1. Assuming you are either in `r/` or `python/`, Dockerize your model: |
96 | | -
|
97 | | - ``` |
98 | | - docker build -t docker.synapse.org/PROJECT_ID/my-model:v1 . |
99 | | - ``` |
100 | | -
|
101 | | - where: |
102 | | -
|
103 | | - * `PROJECT_ID`: Synapse ID of your project |
104 | | - * `my-model`: name of your model |
105 | | - * `v1`: version of your model |
106 | | - * `.`: filepath to the Dockerfile |
107 | | -
|
108 | | - Update the model name and/or tag name as desired. |
109 | | -
|
110 | | -> [!IMPORTANT] |
111 | | -> The submission system uses the x86-64 cpu architecture. If your machine uses a different architecture, e.g. Apple Silicon, you will need to additionally include `--platform linux/amd64` into the command, e.g. |
112 | | -> |
113 | | -> `docker build -t IMAGE_NAME --platform linux/amd64 FILEPATH_TO_DOCKERFILE` |
114 | | -
|
115 | | -3. (optional but highly recommended) Locally run a container to ensure the |
116 | | - model can run successfully: |
117 | | -
|
118 | | - ``` |
119 | | - docker run \ |
120 | | - --rm \ |
121 | | - --network none \ |
122 | | - --volume $PWD/sample_data:/input:ro \ |
123 | | - --volume $PWD/output:/output:rw \ |
124 | | - docker.synapse.org/PROJECT_ID/my-model:v1 |
125 | | - ``` |
126 | | - |
127 | | - where: |
128 | | -
|
129 | | - * `--rm`: stops and removes the container once it is done running |
130 | | - * `--network none`: disables all network connections to the container |
131 | | - (emulating the same behavior seen in the submission queues) |
132 | | - * `--volume ...`: mounts data generated by and used by the container. For |
133 | | - example, `--volume $PWD/sample_data:/input:ro` will mount |
134 | | - `$PWD/sample_data` (from your machine) as `/input` (in the container) |
135 | | - with read-only permissions. |
136 | | - * `docker.synapse.org/PROJECT_ID/my-model:v1`: Docker image and tag |
137 | | - version to run |
138 | | -
|
139 | | - If your model requires a GPU, be sure to expose it by adding `--runtime nvidia` |
140 | | - or `--gpus all` to the `docker run` command. Note that your local machine will |
141 | | - also need the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker). |
| 89 | +1. If you haven't already, change directories to `r/` or `python/`. Then run |
| 90 | + the `build` command to Dockerize your model: |
142 | 91 |
|
143 | | -### Prepare and push your model to Synapse |
| 92 | + ``` |
| 93 | + docker build -t docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION FILEPATH/TO/DOCKERFILE |
| 94 | + ``` |
| 95 | +
|
| 96 | + where: |
| 97 | +
|
| 98 | + - _PROJECT_ID_: Synapse ID of your project. |
| 99 | + - _IMAGE_NAME_: name of your image. |
| 100 | + - _TAG_VERSION_: version of the image. If TAG_VERSION is not supplied, |
| 101 | + `latest` will be used. |
| 102 | + - _FILEPATH/TO/DOCKERFILE_: filepath to the Dockerfile, in this case, it |
| 103 | + will be the current directory (`.`). |
| 104 | +
|
| 105 | +2. (optional but highly recommended) Test your newly-built model by running |
| 106 | + it locally. For example: |
144 | 107 |
|
145 | | -1. If you haven't already, log into the Synapse Docker registry with your |
146 | | - Synapse credentials. We highly recommend you use a Synapse Personal Access |
147 | | - Token (PAT) for this step. Once logged in, you should not have to log in |
148 | | - again, unless you log out or switch Docker registries. |
| 108 | + ``` |
| 109 | + docker run \ |
| 110 | + --rm \ |
| 111 | + --network none \ |
| 112 | + --volume $PWD/sample_data:/input:ro \ |
| 113 | + --volume $PWD/output:/output:rw \ |
| 114 | + docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION |
| 115 | + ``` |
149 | 116 |
|
150 | | - ``` |
151 | | - docker login docker.synapse.org --username SYNAPSE_USERNAME |
152 | | - ``` |
| 117 | + where: |
| 118 | +
|
| 119 | + - `--rm`: removes the container after execution. |
| 120 | + - `--network none`: disables all network connections to the container, |
| 121 | + emulating the same behavior as the Synapse submission system. |
| 122 | + - `--volume SOURCE:DEST:PERMISSIONS`: mounts local directories to the container; |
| 123 | + use absolute paths for _SOURCE_ and _DEST_. |
| 124 | +
|
| 125 | + If your model requires a GPU, add `--runtime nvidia` or `--gpus all`. Ensure |
| 126 | + the [NVIDIA Container Toolkit] is installed if using GPU support. |
| 127 | +
|
| 128 | +### Prepare and push your model to Synapse |
153 | 129 |
|
154 | | - When prompted for a password, enter your PAT. |
| 130 | +1. If you haven't already, log into the Synapse Docker registry. We recommend |
| 131 | + using a Synapse Personal Access Token (PAT) for this step rather than your |
| 132 | + password: |
155 | 133 |
|
156 | | - > [Learn more about Synapse PATs and how to generate one]. |
| 134 | + ``` |
| 135 | + docker login docker.synapse.org --username SYNAPSE_USERNAME |
| 136 | + ``` |
157 | 137 |
|
158 | | - You can also log in non-interactively through `STDIN` - this will prevent |
159 | | - your password from being saved in the shell's history and log files. For |
160 | | - example, if you saved your PAT into a file called `synapse.token`: |
| 138 | + Enter your PAT when prompted. |
161 | 139 |
|
162 | | - ``` |
163 | | - cat ~/synapse.token | \ |
164 | | - docker login docker.synapse.org --username SYNAPSE_USERNAME --password-stdin |
165 | | - ``` |
| 140 | + > [Learn more about Synapse PATs and how to generate one]. |
166 | 141 |
|
167 | | -2. Use `docker push` to push the model up to your project on Synapse. |
| 142 | + You can also log in non-interactively through `STDIN` - this will prevent |
| 143 | + your PAT from being saved in the shell's history and log files. For example, |
| 144 | + if you saved your PAT into a file called `synapse.token`: |
168 | 145 |
|
169 | | - ``` |
170 | | - docker push docker.synapse.org/PROJECT_ID/my-model:v1 |
171 | | - ``` |
| 146 | + ``` |
| 147 | + cat ~/synapse.token | \ |
| 148 | + docker login docker.synapse.org --username SYNAPSE_USERNAME --password-stdin |
| 149 | + ``` |
172 | 150 |
|
173 | | - The Docker image should now be available in the **Docker** tab of your |
174 | | - Synapse project. |
| 151 | +2. Push the Docker image to your Synapse project: |
175 | 152 |
|
| 153 | + ``` |
| 154 | + docker push docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION |
| 155 | + ``` |
176 | 156 |
|
| 157 | + The Docker image will be available in the **Docker** tab of your Synapse |
| 158 | + project. |
177 | 159 |
|
178 | 160 | [Docker]: https://docs.docker.com/get-docker/ |
179 | 161 | [Synapse account]: https://www.synapse.org/# |
180 | 162 | [Trusted Content images]: https://hub.docker.com/search?q=&image_filter=official%2Cstore |
181 | 163 | [Learn more about Docker's build cache]: https://docs.docker.com/build/cache/ |
| 164 | +[NVIDIA Container Toolkit]: https://github.com/NVIDIA/nvidia-docker |
182 | 165 | [Learn more about Synapse PATs and how to generate one]: https://help.synapse.org/docs/Managing-Your-Account.2055405596.html#ManagingYourAccount-PersonalAccessTokens |
0 commit comments