Evaluation Script Templates

Templates for creating evaluation scripts to be plugged into the Synapse ORCA workflow

You can either build off of this repository template or use it as reference to build your scripts from scratch. Provided here is a sample evaluation template in Python. R support TBD.

Requirements

Python 3.11+
Docker (if containerizing manually)

✅ Write your validation script

Determine the format of the predictions file, as this will help create the list of validation checks. Things to consider include:
- File format (e.g. CSV, TSV, text)
- Number of columns
- Column header names
- Column data types
- For numerical columns (integers or floats), are there expected minimum and maximum values?
Beyond the file structure, also think about the data content:
- Can there be multiple prediction for a single ID/patient/sample?
- Is a prediction required for every ID, or are missing values acceptable?
Adapt validate.py so that it fits your needs. The template currently implements the following checks:
- Two columns named id and probability (any additional columns will be ignored)
- id values are strings
- probability values are floats between 0.0 and 1.0, and cannot be null/None
- There is exactly one prediction per patient (no missing or duplicate IDs)
- There are only predictions for patients found in the groundtruth (no unknown IDs)

Note

The template is currently designed with the assumption that the challenge has a single task.

If your challenge has multiple tasks, create additional validation functions (e.g., validate_task2(), validate_task3(), ...) and update the validate() function to direct the validation process to the correct function for each task.

Important

The main() function is specifically designed for seamless interaction with ORCA. Modifying this function is strongly discouraged and could lead to compatibility issues. Unit tests are in place to ensure its proper integration.

Update requirements.txt with any additional libraries/packages used by the script.
(optional) Locally run validate.py to verify its functionality, by replacing the placeholder paths with the filepaths to your data:
```
python validate.py \
  --predictions_file PATH/TO/PREDICTIONS_FILE.CSV \
  --groundtruth_folder PATH/TO/GROUNDTRUTH_FOLDER/ [--output_file PATH/TO/OUTPUT_FILE.JSON] [--task_number TASK_NUMBER]
```
Specify --output_file and --task_number as needed. Use --help for more details.

The expected outcomes are:
- STDOUT will display either VALIDATED or INVALID
- Full validation details are saved in results.json (or the path specified by --output_file)
If needed, you may use the sample data provided in sample_data/, however, thorough testing with your own data is recommended to ensure accurate validation.

🏆 Write your scoring script

Determine the evaluation metrics you will use to assess the predictions. It is recommended to include at least two metrics: a primary metric for ranking and a secondary metric for breaking ties. You can also include additional informative metrics such as sensitivity, specificity, etc.
Adapt score.py to calculate the metrics you have defined. The template currently provides implementations for:
- Area under the receiver operating characteristic curve (AUROC)
- Area under the precision-recall curve (AUPRC)

Note

The template is currently designed with the assumption that the challenge has a single task.

If your challenge has multiple tasks, create additional scoring functions (e.g., score_task2(), score_task3(), ...) and update the score() function to direct the validation process to the correct function for each task.

Important

Like validate.py, modifying main() is strongly discouraged.

Update requirements.txt with any additional libraries/packages used by the script.
(optional) Locally run score.py to ensure it executes correctly and returns the expected scores:
```
python score.py \
  --predictions_file  PATH/TO/PREDICTIONS_FILE.CSV \
  --groundtruth_folder PATH/TO/GROUNDTRUTH_FOLDER/  [--output_file PATH/TO/OUTPUT_FILE.JSON] [--task_number TASK_NUMBER]
```
Specify --output_file and --task_number as needed. Use --help for more details.

The expected outcomes are:
- STDOUT will display either SCORED or INVALID
- Scores are appended to results.json (or the path specified by --output_file)

🐳 Dockerize your scripts

Automated containerization

This template repository includes a workflow that builds a Docker container for your scripts. To trigger the process, you will need to create a new release. For tag versioning, we recommend following the SemVar versioning schema. You can follow the status of the release workflow by going to the Actions tab of your repository.

This workflow will create a new image within your repository, which can be found under Packages. Here is an example of the deployed image for this template.

Manual containerization

You can also use other public Docker registries, such as DockerHub. The only requirement is that the Docker image must be publicly accessible so that ORCA can pull and execute it.

To containerize your scripts:

Open a terminal and switch directories to your local copy of the repository.
Run the command:
```
docker build -t IMAGE_NAME:TAG_VERSION FILEPATH/TO/DOCKERFILE
```
where:
- IMAGE_NAME: name of your image.
- TAG_VERSION: version of the image. If TAG_VERSION is not supplied, latest will be used.
- FILEPATH/TO/DOCKERFILE: filepath to the Dockerfile, in this case, it will be the current directory (.)
If needed, log into your registry of choice.
Push the image:
```
docker push IMAGE_NAME:TAG_VERSION
```

⏭️ Next Steps

Already working with Sage Data Processing & Engineering (DPE) team?

Create a PR to the nf-synapse-challenge repository to add your container image name to your challenge profile.

Need to connect with the DPE team?

Please reach out to the DPE team via their DPE Service Desk for more information and support regarding challenge evaluation orchestration.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
sample_data		sample_data
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
score.py		score.py
utils.py		utils.py
validate.py		validate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Evaluation Script Templates

Templates for creating evaluation scripts to be plugged into the Synapse ORCA workflow

Requirements

✅ Write your validation script

🏆 Write your scoring script

🐳 Dockerize your scripts

Automated containerization

Manual containerization

⏭️ Next Steps

Already working with Sage Data Processing & Engineering (DPE) team?

Need to connect with the DPE team?

About

Uh oh!

Releases 7

Packages

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Sage-Bionetworks-Challenges/orca-evaluation-templates

Folders and files

Latest commit

History

Repository files navigation

Evaluation Script Templates

Templates for creating evaluation scripts to be plugged into the Synapse ORCA workflow

Requirements

✅ Write your validation script

🏆 Write your scoring script

🐳 Dockerize your scripts

Automated containerization

Manual containerization

⏭️ Next Steps

Already working with Sage Data Processing & Engineering (DPE) team?

Need to connect with the DPE team?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Uh oh!

Contributors 2

Uh oh!

Languages

Packages