Skip to content

Commit 8e89bdf

Browse files
vpchungrrchai
andauthored
[feat] Update sample models and data (#5)
* run container as non-root * update sample data to simulate real data * update models to work with updated sample data * update readme * Update r/run_model.R Co-authored-by: Rongrong Chai <73901500+rrchai@users.noreply.github.com> * Update usage examples in README --------- Co-authored-by: Rongrong Chai <73901500+rrchai@users.noreply.github.com>
1 parent bf4b878 commit 8e89bdf

File tree

9 files changed

+232
-163
lines changed

9 files changed

+232
-163
lines changed

README.md

Lines changed: 118 additions & 135 deletions
Original file line numberDiff line numberDiff line change
@@ -5,178 +5,161 @@
55
Templates for creating a Docker model submission on Synapse.
66
</h3>
77

8-
You can either build off of this repository template or use it as reference
9-
to build your model from scratch. We have provided a sample model template
10-
for both R and Python.
8+
You can either build off of this repository template or use it as reference
9+
to build your model from scratch. Sample model templates for both R and
10+
Python are provided.
1111

1212
### Requirements
13-
* Python or R
14-
* [Docker](https://docs.docker.com/get-docker/)
15-
* [Synapse account](https://www.synapse.org/#)
16-
* Synapse project for the challenge
13+
14+
- Python or R
15+
- [Docker](https://docs.docker.com/get-docker/)
16+
- [Synapse account](https://www.synapse.org/#)
17+
- Synapse project for the challenge
1718

1819
---
1920

2021
### Write your algorithm(s)
2122

22-
1. Replace the code in the `run_model.*` script with your own algorithm(s).
23-
You can create additional scripts for modularization/better organization
24-
if desired.
23+
1. Replace the placeholder code in `run_model.*` script with your own
24+
algorithm(s). Create additional functions and scripts for modularity
25+
and organization as needed.
26+
27+
2. Manage the dependencies:
28+
29+
- **Python:** Update `requirements.txt` with any required Python libraries.
30+
- **R:** Update `requirements.R` by modifying the `pkg_list` variable to
31+
include or exclude necessary R packages.
32+
33+
3. (optional) Locally run `run_model.*` to verify it can run successfully.
34+
35+
The scripts have been designed to accept input and output directories as
36+
command-line arguments, allowing you to test with various data locations.
2537

26-
2. If using Python, update `requirements.txt` with any additional
27-
libraries/packages used by your script(s).
38+
- By default, the scripts look for input in the `/input` directory
39+
and write output to the `/output` directory, as expected by the
40+
Synapse submission system.
2841

29-
If using R, update `requirements.R` and add/remove any libraries/packages
30-
listed in `pkg_list` that are used by your script(s).
42+
- To use custom directories, specify them as arguments. For example:
3143

32-
3. (optional) Locally run `run_model.*` to ensure it can run successfully.
44+
**Python**
3345

34-
These scripts have been written so that the input and output files are not
35-
hard-coded in the `/input` and `/output` directories, respectively (though
36-
they are used by default). This way, you can test your changes using any
37-
directories as input and/or output.
46+
```
47+
python python/run_model.py --input-dir sample_data/ --output-dir .
48+
```
3849
39-
For example, the following indicates that the input files are in
40-
`sample_data/`, while the output file should be written to the current
41-
working directory (`.`):
50+
**R**
4251
43-
**Python**
44-
```
45-
python run_model.py --input-dir ../sample_data/ --output-dir .
46-
```
52+
```
53+
Rscript r/run_model.R --input-dir sample_data/ --output-dir .
54+
```
4755
48-
**R**
49-
```
50-
Rscript run_model.R --input-dir ../sample_data/ --output-dir .
51-
```
56+
where:
57+
58+
- `sample_data/` is used as the input directory
59+
- `.` (current working directory) is used as the output directory
5260
5361
### Update the Dockerfile
5462
55-
* Again, make sure that all needed libraries/packages are specified in the
56-
`requirements.*` file. Because all Docker submissions are run without network
57-
access, you will not able to install anything during the container run. If
58-
you do not want to use a `requirements.*` file, you may run replace the RUN
59-
command with the following:
60-
61-
**Python**
62-
```
63-
RUN pip install pandas
64-
```
65-
66-
**R**
67-
```
68-
RUN R -e "install.packages(c('optparse'), repos = 'http://cran.us.r-project.org')"
69-
```
70-
71-
* `COPY` over any additional files required by your model. We recommend using
72-
one `COPY` command per file, as this can help speed up build time.
73-
74-
* Feel free to update the base image and/or tag version if the provided base
75-
image do not fulfill your needs. Although you can use any valid image as the
76-
base, we recommend using one of the [Trusted Content images], especially if
77-
you are new to Docker. Images to consider:
78-
* ubuntu
79-
* python
80-
* bitnami/pytorch
81-
* r-base
82-
* rocker/tidyverse
83-
84-
* If your image takes some time to build, look at the order of your Dockerfile
85-
commands -- **the order matters**. To best take advantage of Docker's
86-
build-caching (that is, reusing previously built layers), it's often a good
87-
idea to put frequently-changing parts (such as `run_model.*`) near the end
88-
of the Dockerfile. The way build-caching works is that once a step needs to
89-
be rebuilt, all of the subsequent steps will also be rebuilt.
63+
- Ensure all dependencies are listed in `requirements.*` so that they are
64+
installed during this build process, as network access is disabled when
65+
your submission is run.
66+
67+
- Use `COPY` to add any files required by your model. We recommend using
68+
one `COPY` command per file for optimized build caching.
69+
70+
- Update the base image and/or tag version if the provided base do not
71+
fulfill your needs. Although you may use any valid image as the base,
72+
we recommend using one of the [Trusted Content images] for security and
73+
reliability, such as:
74+
75+
* `ubuntu`
76+
* `python`
77+
* `bitnami/pytorch`
78+
* `r-base`
79+
* `rocker/tidyverse`
80+
81+
- If your image is taking some time to build, consider optimizing the order
82+
of the Dockerfile commands by placing frequently changing parts near the
83+
end. This will take advantage of Docker's build caching.
9084
9185
> [Learn more about Docker's build cache].
9286
9387
### Build your model
9488
95-
1. Assuming you are either in `r/` or `python/`, Dockerize your model:
96-
97-
```
98-
docker build -t docker.synapse.org/PROJECT_ID/my-model:v1 .
99-
```
100-
101-
where:
102-
103-
* `PROJECT_ID`: Synapse ID of your project
104-
* `my-model`: name of your model
105-
* `v1`: version of your model
106-
* `.`: filepath to the Dockerfile
107-
108-
Update the model name and/or tag name as desired.
109-
110-
> [!IMPORTANT]
111-
> The submission system uses the x86-64 cpu architecture. If your machine uses a different architecture, e.g. Apple Silicon, you will need to additionally include `--platform linux/amd64` into the command, e.g.
112-
>
113-
> `docker build -t IMAGE_NAME --platform linux/amd64 FILEPATH_TO_DOCKERFILE`
114-
115-
3. (optional but highly recommended) Locally run a container to ensure the
116-
model can run successfully:
117-
118-
```
119-
docker run \
120-
--rm \
121-
--network none \
122-
--volume $PWD/sample_data:/input:ro \
123-
--volume $PWD/output:/output:rw \
124-
docker.synapse.org/PROJECT_ID/my-model:v1
125-
```
126-
127-
where:
128-
129-
* `--rm`: stops and removes the container once it is done running
130-
* `--network none`: disables all network connections to the container
131-
(emulating the same behavior seen in the submission queues)
132-
* `--volume ...`: mounts data generated by and used by the container. For
133-
example, `--volume $PWD/sample_data:/input:ro` will mount
134-
`$PWD/sample_data` (from your machine) as `/input` (in the container)
135-
with read-only permissions.
136-
* `docker.synapse.org/PROJECT_ID/my-model:v1`: Docker image and tag
137-
version to run
138-
139-
If your model requires a GPU, be sure to expose it by adding `--runtime nvidia`
140-
or `--gpus all` to the `docker run` command. Note that your local machine will
141-
also need the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker).
89+
1. If you haven't already, change directories to `r/` or `python/`. Then run
90+
the `build` command to Dockerize your model:
14291
143-
### Prepare and push your model to Synapse
92+
```
93+
docker build -t docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION FILEPATH/TO/DOCKERFILE
94+
```
95+
96+
where:
97+
98+
- _PROJECT_ID_: Synapse ID of your project.
99+
- _IMAGE_NAME_: name of your image.
100+
- _TAG_VERSION_: version of the image. If TAG_VERSION is not supplied,
101+
`latest` will be used.
102+
- _FILEPATH/TO/DOCKERFILE_: filepath to the Dockerfile, in this case, it
103+
will be the current directory (`.`).
104+
105+
2. (optional but highly recommended) Test your newly-built model by running
106+
it locally. For example:
144107
145-
1. If you haven't already, log into the Synapse Docker registry with your
146-
Synapse credentials. We highly recommend you use a Synapse Personal Access
147-
Token (PAT) for this step. Once logged in, you should not have to log in
148-
again, unless you log out or switch Docker registries.
108+
```
109+
docker run \
110+
--rm \
111+
--network none \
112+
--volume $PWD/sample_data:/input:ro \
113+
--volume $PWD/output:/output:rw \
114+
docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION
115+
```
149116
150-
```
151-
docker login docker.synapse.org --username SYNAPSE_USERNAME
152-
```
117+
where:
118+
119+
- `--rm`: removes the container after execution.
120+
- `--network none`: disables all network connections to the container,
121+
emulating the same behavior as the Synapse submission system.
122+
- `--volume SOURCE:DEST:PERMISSIONS`: mounts local directories to the container;
123+
use absolute paths for _SOURCE_ and _DEST_.
124+
125+
If your model requires a GPU, add `--runtime nvidia` or `--gpus all`. Ensure
126+
the [NVIDIA Container Toolkit] is installed if using GPU support.
127+
128+
### Prepare and push your model to Synapse
153129
154-
When prompted for a password, enter your PAT.
130+
1. If you haven't already, log into the Synapse Docker registry. We recommend
131+
using a Synapse Personal Access Token (PAT) for this step rather than your
132+
password:
155133
156-
> [Learn more about Synapse PATs and how to generate one].
134+
```
135+
docker login docker.synapse.org --username SYNAPSE_USERNAME
136+
```
157137
158-
You can also log in non-interactively through `STDIN` - this will prevent
159-
your password from being saved in the shell's history and log files. For
160-
example, if you saved your PAT into a file called `synapse.token`:
138+
Enter your PAT when prompted.
161139
162-
```
163-
cat ~/synapse.token | \
164-
docker login docker.synapse.org --username SYNAPSE_USERNAME --password-stdin
165-
```
140+
> [Learn more about Synapse PATs and how to generate one].
166141
167-
2. Use `docker push` to push the model up to your project on Synapse.
142+
You can also log in non-interactively through `STDIN` - this will prevent
143+
your PAT from being saved in the shell's history and log files. For example,
144+
if you saved your PAT into a file called `synapse.token`:
168145
169-
```
170-
docker push docker.synapse.org/PROJECT_ID/my-model:v1
171-
```
146+
```
147+
cat ~/synapse.token | \
148+
docker login docker.synapse.org --username SYNAPSE_USERNAME --password-stdin
149+
```
172150
173-
The Docker image should now be available in the **Docker** tab of your
174-
Synapse project.
151+
2. Push the Docker image to your Synapse project:
175152
153+
```
154+
docker push docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION
155+
```
176156
157+
The Docker image will be available in the **Docker** tab of your Synapse
158+
project.
177159
178160
[Docker]: https://docs.docker.com/get-docker/
179161
[Synapse account]: https://www.synapse.org/#
180162
[Trusted Content images]: https://hub.docker.com/search?q=&image_filter=official%2Cstore
181163
[Learn more about Docker's build cache]: https://docs.docker.com/build/cache/
164+
[NVIDIA Container Toolkit]: https://github.com/NVIDIA/nvidia-docker
182165
[Learn more about Synapse PATs and how to generate one]: https://help.synapse.org/docs/Managing-Your-Account.2055405596.html#ManagingYourAccount-PersonalAccessTokens

python/Dockerfile

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,31 @@
11
# Define the base of your image.
22
# We recommend to always specify a tag. `latest` may not
33
# always ensure reproducibility.
4-
FROM python:3.10-slim
4+
FROM --platform=linux/amd64 python:3.11-slim
5+
6+
# For best practice, run container as non-root user.
7+
RUN groupadd -r user && useradd -m --no-log-init -r -g user user
8+
USER user
59

610
# Set the working directory for the COPY, RUN, and ENTRYPOINT
711
# commands of the Dockerfile.
8-
WORKDIR /usr/local/bin
12+
WORKDIR /home/user
913

1014
# Copy files over to the image.
1115
# We recommend copying over each file individually, as to take
1216
# advantage of cache building (which helps reduce build time).
13-
COPY requirements.txt .
17+
COPY --chown=user:user requirements.txt .
1418

1519
# Install needed libraries/packages.
1620
# Your model will be run without network access, so the dependencies
1721
# must be installed here (and not during code execution).
18-
RUN pip install -r requirements.txt
22+
RUN pip install \
23+
--user \
24+
--no-cache-dir \
25+
--break-system-packages \
26+
-r requirements.txt
1927

20-
COPY run_model.py .
28+
COPY --chown=user:user run_model.py .
2129

2230
# Set the main command of the image.
2331
# We recommend using this form instead of `ENTRYPOINT command param1`.

python/requirements.txt

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
1-
typer
2-
pandas
1+
numpy==2.2.3
2+
pandas==2.2.2
3+
typer==0.9.4

python/run_model.py

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,29 +1,37 @@
11
"""Python Model Example"""
2+
23
import os
34

5+
import numpy as np
46
import pandas as pd
57
import typer
68
from typing_extensions import Annotated
79

810

9-
def predict(df):
10-
"""
11-
Run a prediction: full name will only contains two names.
11+
def predict(df: pd.DataFrame) -> pd.DataFrame:
12+
"""Sample prediction function.
13+
14+
TODO: Replace this with your actual model prediction logic. In this
15+
example, random floats are assigned.
1216
"""
13-
df[["first_name", "last_name"]] = df["name"].str.split(" ", n=1, expand=True)
14-
return df
17+
pred = df.loc[:, ["id"]]
18+
pred["probability"] = np.random.random_sample(size=len(pred.index))
19+
return pred
1520

1621

1722
def main(
1823
input_dir: Annotated[str, typer.Option()] = "/input",
1924
output_dir: Annotated[str, typer.Option()] = "/output",
2025
):
2126
"""
22-
Run inference using data in input_dir and output predictions to output_dir
27+
Run inference using data in input_dir and output predictions to output_dir.
2328
"""
24-
data = pd.read_csv(os.path.join(input_dir, "names.csv"))
29+
data = pd.read_csv(os.path.join(input_dir, "data.csv"))
2530
predictions = predict(data)
26-
predictions.to_csv(os.path.join(output_dir, "predictions.csv"), index=False)
31+
predictions.to_csv(
32+
os.path.join(output_dir, "predictions.csv"),
33+
index=False,
34+
)
2735

2836

2937
if __name__ == "__main__":

0 commit comments

Comments
 (0)