Sage-Bionetworks-Challenges
diff --git a/‎README.md‎
Lines changed: 118 additions & 135 deletions b/‎README.md‎
Lines changed: 118 additions & 135 deletions
diff --git a/‎python/Dockerfile‎
Lines changed: 13 additions & 5 deletions b/‎python/Dockerfile‎
Lines changed: 13 additions & 5 deletions
diff --git a/‎python/requirements.txt‎
Lines changed: 3 additions & 2 deletions b/‎python/requirements.txt‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎python/run_model.py‎
Lines changed: 16 additions & 8 deletions b/‎python/run_model.py‎
Lines changed: 16 additions & 8 deletions
@@ -5,178 +5,161 @@
     Templates for creating a Docker model submission on Synapse.
 </h3>
 
-You can either build off of this repository template or use it as reference 
-to build your model from scratch.  We have provided a sample model template
-for both R and Python.
+You can either build off of this repository template or use it as reference
+to build your model from scratch. Sample model templates for both R and
+Python are provided.
 
 ### Requirements
-* Python or R
-* [Docker](https://docs.docker.com/get-docker/)
-* [Synapse account](https://www.synapse.org/#)
-* Synapse project for the challenge
+
+- Python or R
+- [Docker](https://docs.docker.com/get-docker/)
+- [Synapse account](https://www.synapse.org/#)
+- Synapse project for the challenge
 
 ---
 
 ### Write your algorithm(s)
 
-1. Replace the code in the `run_model.*` script with your own algorithm(s).
-    You can create additional scripts for modularization/better organization
-    if desired.
+1. Replace the placeholder code in `run_model.*` script with your own
+   algorithm(s). Create additional functions and scripts for modularity
+   and organization as needed.
+
+2. Manage the dependencies:
+
+   - **Python:** Update `requirements.txt` with any required Python libraries.
+   - **R:** Update `requirements.R` by modifying the `pkg_list` variable to
+     include or exclude necessary R packages.
+
+3. (optional) Locally run `run_model.*` to verify it can run successfully.
+
+   The scripts have been designed to accept input and output directories as
+   command-line arguments, allowing you to test with various data locations.
 
-2. If using Python, update `requirements.txt` with any additional
-    libraries/packages used by your script(s).
+   - By default, the scripts look for input in the `/input` directory
+     and write output to the `/output` directory, as expected by the
+     Synapse submission system.
 
-    If using R, update `requirements.R` and add/remove any libraries/packages
-    listed in `pkg_list` that are used by your script(s).
+   - To use custom directories, specify them as arguments. For example:
 
-3. (optional) Locally run `run_model.*` to ensure it can run successfully.
+     **Python**
 
-    These scripts have been written so that the input and output files are not
-    hard-coded in the `/input` and `/output` directories, respectively (though
-    they are used by default).  This way, you can test your changes using any
-    directories as input and/or output.
+     ```
+     python python/run_model.py --input-dir sample_data/ --output-dir .
+     ```
 
-    For example, the following indicates that the input files are in
-    `sample_data/`, while the output file should be written to the current
-    working directory (`.`):
+     **R**
 
-    **Python**
-    ```
-    python run_model.py --input-dir ../sample_data/ --output-dir .
-    ```
+     ```
+     Rscript r/run_model.R --input-dir sample_data/ --output-dir .
+     ```
 
-    **R**
-    ```
-    Rscript run_model.R --input-dir ../sample_data/ --output-dir .
-    ```
+     where:
+
+     - `sample_data/` is used as the input directory
+     - `.` (current working directory) is used as the output directory
 
 ### Update the Dockerfile
 
-* Again, make sure that all needed libraries/packages are specified in the 
-`requirements.*` file.  Because all Docker submissions are run without network
-access, you will not able to install anything during the container run. If
-you do not want to use a `requirements.*` file, you may run replace the RUN
-command with the following:
-
-    **Python**
-    ```
-    RUN pip install pandas
-    ```
-
-    **R**
-    ```
-    RUN R -e "install.packages(c('optparse'), repos = 'http://cran.us.r-project.org')"
-    ```
-
-* `COPY` over any additional files required by your model. We recommend using
-one `COPY` command per file, as this can help speed up build time.
-
-* Feel free to update the base image and/or tag version if the provided base
-image do not fulfill your needs. Although you can use any valid image as the
-base, we recommend using one of the [Trusted Content images], especially if
-you are new to Docker. Images to consider:
-    * ubuntu
-    * python
-    * bitnami/pytorch
-    * r-base
-    * rocker/tidyverse
-
-* If your image takes some time to build, look at the order of your Dockerfile
-commands -- **the order matters**.  To best take advantage of Docker's
-build-caching (that is, reusing previously built layers), it's often a good
-idea to put frequently-changing parts (such as `run_model.*`) near the end
-of the Dockerfile. The way build-caching works is that once a step needs to
-be rebuilt, all of the subsequent steps will also be rebuilt.
+- Ensure all dependencies are listed in `requirements.*` so that they are
+  installed during this build process, as network access is disabled when
+  your submission is run.
+
+- Use `COPY` to add any files required by your model. We recommend using
+  one `COPY` command per file for optimized build caching.
+
+- Update the base image and/or tag version if the provided base do not 
+  fulfill your needs. Although you may use any valid image as the base,
+  we recommend using one of the [Trusted Content images] for security and
+  reliability, such as:
+
+  * `ubuntu`
+  * `python`
+  * `bitnami/pytorch`
+  * `r-base`
+  * `rocker/tidyverse`
+
+- If your image is taking some time to build, consider optimizing the order
+  of the Dockerfile commands by placing frequently changing parts near the
+  end. This will take advantage of Docker's build caching.
 
     > [Learn more about Docker's build cache].
 
 ### Build your model
 
-1. Assuming you are either in `r/` or `python/`, Dockerize your model:
-
-    ```
-    docker build -t docker.synapse.org/PROJECT_ID/my-model:v1 .
-    ```
-
-    where:
-
-    * `PROJECT_ID`: Synapse ID of your project
-    * `my-model`: name of your model
-    * `v1`: version of your model
-    * `.`: filepath to the Dockerfile
-
-    Update the model name and/or tag name as desired.
-
-> [!IMPORTANT]
-> The submission system uses the x86-64 cpu architecture. If your machine uses a different architecture, e.g. Apple Silicon, you will need to additionally include `--platform linux/amd64` into the command, e.g.
-> 
-> `docker build -t IMAGE_NAME --platform linux/amd64 FILEPATH_TO_DOCKERFILE`
-
-3. (optional but highly recommended) Locally run a container to ensure the
-    model can run successfully:
-
-    ```
-    docker run \
-        --rm \
-        --network none \
-        --volume $PWD/sample_data:/input:ro \
-        --volume $PWD/output:/output:rw \
-        docker.synapse.org/PROJECT_ID/my-model:v1
-    ```
-    
-    where:
-
-    * `--rm`: stops and removes the container once it is done running
-    * `--network none`: disables all network connections to the container
-        (emulating the same behavior seen in the submission queues)
-    * `--volume ...`: mounts data generated by and used by the container. For
-        example, `--volume $PWD/sample_data:/input:ro` will mount
-        `$PWD/sample_data` (from your machine) as `/input` (in the container)
-        with read-only permissions.
-    * `docker.synapse.org/PROJECT_ID/my-model:v1`: Docker image and tag
-        version to run
-
-    If your model requires a GPU, be sure to expose it by adding `--runtime nvidia`
-    or `--gpus all` to the `docker run` command. Note that your local machine will
-    also need the [NVIDIA Container Toolkit](https://github.com/NVIDIA/nvidia-docker).
+1. If you haven't already, change directories to `r/` or `python/`. Then run
+   the `build` command to Dockerize your model:
 
-### Prepare and push your model to Synapse
+   ```
+   docker build -t docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION FILEPATH/TO/DOCKERFILE
+   ```
+
+   where:
+
+   - _PROJECT_ID_: Synapse ID of your project.
+   - _IMAGE_NAME_: name of your image.
+   - _TAG_VERSION_: version of the image. If TAG_VERSION is not supplied,
+     `latest` will be used.
+   - _FILEPATH/TO/DOCKERFILE_: filepath to the Dockerfile, in this case, it
+     will be the current directory (`.`).
+
+2. (optional but highly recommended) Test your newly-built model by running
+   it locally. For example:
 
-1. If you haven't already, log into the Synapse Docker registry with your
-    Synapse credentials. We highly recommend you use a Synapse Personal Access
-    Token (PAT) for this step. Once logged in, you should not have to log in
-    again, unless you log out or switch Docker registries.
+   ```
+   docker run \
+       --rm \
+       --network none \
+       --volume $PWD/sample_data:/input:ro \
+       --volume $PWD/output:/output:rw \
+       docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION
+   ```
 
-    ```
-    docker login docker.synapse.org --username SYNAPSE_USERNAME
-    ```
+   where:
+
+   - `--rm`: removes the container after execution.
+   - `--network none`: disables all network connections to the container,
+     emulating the same behavior as the Synapse submission system.
+   - `--volume SOURCE:DEST:PERMISSIONS`: mounts local directories to the container;
+     use absolute paths for _SOURCE_ and _DEST_.
+
+   If your model requires a GPU, add `--runtime nvidia` or `--gpus all`. Ensure
+   the [NVIDIA Container Toolkit] is installed if using GPU support.
+
+### Prepare and push your model to Synapse
 
-    When prompted for a password, enter your PAT.
+1. If you haven't already, log into the Synapse Docker registry.  We recommend
+   using a Synapse Personal Access Token (PAT) for this step rather than your
+   password:
 
-    > [Learn more about Synapse PATs and how to generate one].
+   ```
+   docker login docker.synapse.org --username SYNAPSE_USERNAME
+   ```
 
-    You can also log in non-interactively through `STDIN` - this will prevent
-    your password from being saved in the shell's history and log files. For
-    example, if you saved your PAT into a file called `synapse.token`:
+   Enter your PAT when prompted.
 
-    ```
-    cat ~/synapse.token | \
-      docker login docker.synapse.org --username SYNAPSE_USERNAME --password-stdin
-    ```
+   > [Learn more about Synapse PATs and how to generate one].
 
-2. Use `docker push` to push the model up to your project on Synapse.
+   You can also log in non-interactively through `STDIN` - this will prevent
+   your PAT from being saved in the shell's history and log files. For example,
+   if you saved your PAT into a file called `synapse.token`:
 
-    ```
-    docker push docker.synapse.org/PROJECT_ID/my-model:v1
-    ```
+   ```
+   cat ~/synapse.token | \
+     docker login docker.synapse.org --username SYNAPSE_USERNAME --password-stdin
+   ```
 
-    The Docker image should now be available in the **Docker** tab of your
-    Synapse project.
+2. Push the Docker image to your Synapse project:
 
+   ```
+   docker push docker.synapse.org/PROJECT_ID/IMAGE_NAME:TAG_VERSION
+   ```
 
+   The Docker image will be available in the **Docker** tab of your Synapse
+   project.
 
 [Docker]: https://docs.docker.com/get-docker/
 [Synapse account]: https://www.synapse.org/#
 [Trusted Content images]: https://hub.docker.com/search?q=&image_filter=official%2Cstore
 [Learn more about Docker's build cache]: https://docs.docker.com/build/cache/
+[NVIDIA Container Toolkit]: https://github.com/NVIDIA/nvidia-docker
 [Learn more about Synapse PATs and how to generate one]: https://help.synapse.org/docs/Managing-Your-Account.2055405596.html#ManagingYourAccount-PersonalAccessTokens
@@ -1,23 +1,31 @@
 # Define the base of your image.
 # We recommend to always specify a tag. `latest` may not 
 # always ensure reproducibility.
-FROM python:3.10-slim
+FROM --platform=linux/amd64 python:3.11-slim
+
+# For best practice, run container as non-root user.
+RUN groupadd -r user && useradd -m --no-log-init -r -g user user
+USER user
 
 # Set the working directory for the COPY, RUN, and ENTRYPOINT
 # commands of the Dockerfile.
-WORKDIR /usr/local/bin
+WORKDIR /home/user
 
 # Copy files over to the image.
 # We recommend copying over each file individually, as to take
 # advantage of cache building (which helps reduce build time).
-COPY requirements.txt .
+COPY --chown=user:user requirements.txt .
 
 # Install needed libraries/packages.
 # Your model will be run without network access, so the dependencies
 # must be installed here (and not during code execution).
-RUN pip install -r requirements.txt
+RUN pip install \
+    --user \
+    --no-cache-dir \
+    --break-system-packages \
+    -r requirements.txt
 
-COPY run_model.py .
+COPY --chown=user:user run_model.py .
 
 # Set the main command of the image.
 # We recommend using this form instead of `ENTRYPOINT command param1`.
 
@@ -1,2 +1,3 @@
-typer
-pandas
+numpy==2.2.3
+pandas==2.2.2
+typer==0.9.4
@@ -1,29 +1,37 @@
 """Python Model Example"""
+
 import os
 
+import numpy as np
 import pandas as pd
 import typer
 from typing_extensions import Annotated
 
 
-def predict(df):
-    """
-    Run a prediction: full name will only contains two names.
+def predict(df: pd.DataFrame) -> pd.DataFrame:
+    """Sample prediction function.
+
+    TODO: Replace this with your actual model prediction logic. In this
+    example, random floats are assigned.
     """
-    df[["first_name", "last_name"]] = df["name"].str.split(" ", n=1, expand=True)
-    return df
+    pred = df.loc[:, ["id"]]
+    pred["probability"] = np.random.random_sample(size=len(pred.index))
+    return pred
 
 
 def main(
     input_dir: Annotated[str, typer.Option()] = "/input",
     output_dir: Annotated[str, typer.Option()] = "/output",
 ):
     """
-    Run inference using data in input_dir and output predictions to output_dir
+    Run inference using data in input_dir and output predictions to output_dir.
     """
-    data = pd.read_csv(os.path.join(input_dir, "names.csv"))
+    data = pd.read_csv(os.path.join(input_dir, "data.csv"))
     predictions = predict(data)
-    predictions.to_csv(os.path.join(output_dir, "predictions.csv"), index=False)
+    predictions.to_csv(
+        os.path.join(output_dir, "predictions.csv"),
+        index=False,
+    )
 
 
 if __name__ == "__main__":