|
| 1 | +This directory contains [carrot](https://github.com/broadinstitute/carrot) |
| 2 | +tests for the GAKT-SV pipeline's WDLs; the tests are organized in folders |
| 3 | +containing `carrot` resources (e.g., evaluation WDL, default/test inputs). |
| 4 | +Additionally, a utility script, `carrot_help.py`, is provided that |
| 5 | +automates defining tests to Carrot, and running and checking their execution |
| 6 | +status. Generally, with tests organized in a particular folder hierarchy, |
| 7 | +with a single call to the utility script |
| 8 | +(`python carrot_helper.py test run ./*`) every step from defining and |
| 9 | +running tests are automatically executed, ideally simplifying |
| 10 | +defining/running Carrot tests without requiring domain-specific expertise. |
| 11 | + |
| 12 | +## Organize Tests in Directories |
| 13 | + |
| 14 | +Test cases for a WDL need to be organized in directories where each |
| 15 | +contain separate subdirectories for test cohorts each containing |
| 16 | +`carrot` resources. For instance, the `ExpansionHunterDenovo.wdl` |
| 17 | +performs case-control analysis and outlier detection on a set |
| 18 | +of BAM files based on their short-tandem repeat (STR) profiles. |
| 19 | +In order to test this WDL for case-control analysis using a cohort |
| 20 | +of simulated data, `carrot` resources need to be organized as |
| 21 | +the following for the `carrot_helper` utility to automatically |
| 22 | +setup, run, and track `carrot` tests. |
| 23 | + |
| 24 | +```shell |
| 25 | +├── wdl |
| 26 | +│ └── ExpansionHunterDenovo.wdl |
| 27 | +└── wdl_test |
| 28 | + └── ExpansionHunterDenovo |
| 29 | + └── casecontrol |
| 30 | + ├── simulated_data |
| 31 | + │ ├── eval_input.json |
| 32 | + │ └── test_input.json |
| 33 | + ├── eval.wdl |
| 34 | + ├── eval_input_defaults.json |
| 35 | + └── test_input_defaults.json |
| 36 | +``` |
| 37 | + |
| 38 | +Accordingly: |
| 39 | + |
| 40 | +- Create a folder with the _same name_ as the WDL file containing the |
| 41 | +workflow you want to test (`ExpansionHunterDenovo` for |
| 42 | +`ExpansionHunterDenovo.wdl` in this example). |
| 43 | + |
| 44 | + |
| 45 | +- Create a separate folder for every evaluation you want to perform (e.g., |
| 46 | +the `casecontrol` folder to evaluate `ExpansionHunterDenovo.wdl`'s |
| 47 | +case-control analysis on the STR profiles of input BAM files). |
| 48 | +While all assertions can be part of a single evaluation, it is generally |
| 49 | +a good practice to break assertions into smaller atomic evaluations. |
| 50 | + |
| 51 | + |
| 52 | +- Inside every evaluation directory, create three files: `eval.wdl`, |
| 53 | +`eval_input_defaults.json`, and `test_input_defaults.json`. |
| 54 | +The `eval.wdl` WDL receives outputs of the workflow you're testing and |
| 55 | +asserts their values. The JSON files provide default inputs to the test |
| 56 | +(`ExpansionHunterDenovo.wdl`) and `eval.wdl` WDLs. For instance, if the |
| 57 | +majority of the tests are running `eval.wdl` on a common docker image, |
| 58 | +the image name can be set in the `eval_input_defaults.json`, which can be |
| 59 | +overridden in the tests that execute `eval.wdl` on a different docker image. |
| 60 | + |
| 61 | + |
| 62 | +- An evaluation can be performed using different set of inputs for the |
| 63 | +test and evaluation workflows. For instance, in the STR analysis scenario, |
| 64 | +we pass |
| 65 | +[seven BAM files](https://github.com/VJalili/gatk-sv/blob/89e67350ea7fec8edc687011ac7308e3e1db17ff/wdl_test/ExpansionHunterDenovo/casecontrol/simulated_data/test_input.json#L4-L12) |
| 66 | +to the `ExpansionHunterDenovo.wdl`, run the WDL, and pass |
| 67 | +[its output](https://github.com/VJalili/gatk-sv/blob/89e67350ea7fec8edc687011ac7308e3e1db17ff/wdl_test/ExpansionHunterDenovo/casecontrol/simulated_data/eval_input.json#L2) |
| 68 | +along with the |
| 69 | +[expected output](https://github.com/VJalili/gatk-sv/blob/89e67350ea7fec8edc687011ac7308e3e1db17ff/wdl_test/ExpansionHunterDenovo/casecontrol/simulated_data/eval_input.json#L3) |
| 70 | +to the evaluation WDL. Different combinations of inputs the test and |
| 71 | +evaluation workflows are grouped under separate subdirectories (e.g., |
| 72 | +the `simulated_data` subdirectory for `casecontrol` assertion of |
| 73 | +`ExpansionHunterDenovo.wdl`). The inputs for test and evaluation |
| 74 | +WDLs are specified using two JSON files, `test_input.json` and |
| 75 | +`eval_input.json`, containing inputs for the test and evaluation |
| 76 | +WDLs respectively. The files should be located in the subdirectory of the |
| 77 | +test cohort (e.g, `casecontrol/simulated_data/test_input.json`). |
| 78 | + |
| 79 | + |
| 80 | +- In order to pass any file to the WDLs via the JSON files, the files |
| 81 | +need to be stored on a publicly accessible Google storage bucket. |
| 82 | + |
| 83 | + |
| 84 | +- In order to pass the output of test WDL as input to the evaluation WDL, |
| 85 | +the value of the key should be prefixed with `test_output:` (see `carrot`'s |
| 86 | +[documentation](https://github.com/broadinstitute/carrot/blob/0f616c0a9933a44bb92bc9ddbc90b81b0b532de6/UserGuide.md#-mapping-test-outputs-to-eval-inputs)). |
| 87 | +For instance: |
| 88 | + |
| 89 | + ```json |
| 90 | + "EvalCaseControlLocus.multisample_profile": "test_output:EHdnSTRAnalysis.multisample_profile", |
| 91 | + ``` |
| 92 | + |
| 93 | + |
| 94 | +## Carrot Helper |
| 95 | + |
| 96 | +The `carrot_helper` utility script automates few routine task for |
| 97 | +running and updating Carrot tests. This script is not a replacement |
| 98 | +for `carrot_cli` or Carrot's API that have more expressive power, |
| 99 | +wider functionality, and generalization than `carrot_helper`. |
| 100 | + |
| 101 | +### Setup |
| 102 | + |
| 103 | +1. Install `carrot_cli`: |
| 104 | + - Install the `dev` version of [`carrot_cli`](https://github.com/broadinstitute/carrot_cli) |
| 105 | + as the following. We install the `dev` since `carrot_helper` leverages |
| 106 | + unreleased feature of `carrot_cli`. |
| 107 | + |
| 108 | + ```shell |
| 109 | + git clone https://github.com/broadinstitute/carrot_cli/ |
| 110 | + pip install -r dev-requirements.txt |
| 111 | + pip install -e . |
| 112 | + ``` |
| 113 | + |
| 114 | + - [Configure `carrot_cli`]((https://github.com/broadinstitute/carrot/blob/master/UserGuide.md#-carrot-cli)): |
| 115 | + configure it to access a [Carrot server](https://github.com/broadinstitute/carrot) |
| 116 | + and set your email address. |
| 117 | + |
| 118 | + |
| 119 | +2. Install latest version of |
| 120 | +[`womtool`](https://github.com/broadinstitute/cromwell/releases). |
| 121 | + |
| 122 | + |
| 123 | +3. Setup `carrot_helper.py` by executing the following command providing |
| 124 | +values for its prompts: |
| 125 | + |
| 126 | + ```shell |
| 127 | + $ cd gatk-sv/wdl_test |
| 128 | + $ python carrot_helper.py config |
| 129 | + ``` |
| 130 | + |
| 131 | + Carrot fetches the test and evaluation WDLs for every test from |
| 132 | +a publicly accessible GitHub repository. Therefore, in order to define/update |
| 133 | +tests, `carrot_helper` requires to know the GitHub repository and the git |
| 134 | +branch where the test and evaluation WDLs are available. If you want to run |
| 135 | +existing tests, you may use `https://github.com/broadinstitute/gatk-sv` and |
| 136 | +`master` for repository and branch respectively. If you are developing |
| 137 | +a carrot test for a WDL, then you may set the repository to your fork |
| 138 | +of `github.com/broadinstitute/gatk-sv` and set the branch to your feature |
| 139 | +branch. |
| 140 | + |
| 141 | + |
| 142 | +### Run Carrot Helper |
| 143 | + |
| 144 | +```shell |
| 145 | +cd wdl_test |
| 146 | +python carrot_helper.py test run ./* |
| 147 | +``` |
| 148 | +_Note that the script should be invoked from the `wdl_test` directory._ |
| 149 | + |
| 150 | +This above command will define every test (in the above-discussed directory |
| 151 | +structure) to Carrot, and will run them all. The information of the created |
| 152 | +and executed tests are persisted in `.carrot_pipelines.json` and `.runs.json` |
| 153 | +files. |
| 154 | + |
| 155 | +You can specify a single test to run; for instance: |
| 156 | + |
| 157 | +```shell |
| 158 | +python carrot_helper.py test run STRAnalyzer/comparative/real_cohort |
| 159 | +``` |
| 160 | + |
| 161 | +Or you may use wildcards to specify particular tests to run. For instance: |
| 162 | + |
| 163 | +```shell |
| 164 | +python carrot_helper.py test run STRAnalyzer/*/real_cohort |
| 165 | +``` |
| 166 | + |
| 167 | +To check for the status of the runs, you use the following command. |
| 168 | + |
| 169 | +```shell |
| 170 | +python carrot_helper.py test update_status |
| 171 | +``` |
| 172 | + |
| 173 | +### Reusable Resources |
| 174 | +The `carrot_helper.py` persists any metadata about the carrot resources it |
| 175 | +creates (e.g., |
| 176 | +[pipeline](https://github.com/broadinstitute/carrot/blob/master/UserGuide.md#-pipeline), |
| 177 | +[template](https://github.com/broadinstitute/carrot/blob/master/UserGuide.md#-template), |
| 178 | +[test](https://github.com/broadinstitute/carrot/blob/master/UserGuide.md#-test), |
| 179 | +[result](https://github.com/broadinstitute/carrot/blob/master/UserGuide.md#-result) |
| 180 | +and any necessary mapping between them) in the `.carrot_pipelines.json`. |
| 181 | + |
| 182 | +The `.carrot_pipelines.json` file tracked on git contains metadata belonging |
| 183 | +to the `carrot` resources defined for tests and WDLs available from the |
| 184 | +`master` branch of the |
| 185 | +[`github.com/broadinstitute/gatk-sv`](https://github.com/broadinstitute/gatk-sv) |
| 186 | +repository on a `carrot` server maintained for internal use at the Broad |
| 187 | +institute. You may use this file to run and updated (read the following) |
| 188 | +tests if you have access to Broad's VPN. Otherwise, you may remove or rename |
| 189 | +the `.carrot_pipelines.json` file, **without tracking the changes on git**, |
| 190 | +and let the `carrot_helper.py` create resources on the `carrot` server for |
| 191 | +the repository and branch [you have configured](#setup-carrot-helper). |
| 192 | + |
| 193 | +`carrot_helper.py` automatically initializes and updates the |
| 194 | +`.carrot_pipelines.json`. When `carrot_helper.py test run` is invoked, |
| 195 | +the script traverses the `wdl_test` and initializes/updates `carrot` |
| 196 | +resources if any of the test or evaluations WDLs or their inputs are |
| 197 | +changed. Carrot reads test and evaluation WDLs from github; therefore, |
| 198 | +make sure you commit and push changes to your branch when updating |
| 199 | +test and evaluation WDLs. |
| 200 | + |
| 201 | + |
| 202 | +### Carrot Report |
| 203 | +Carrot can pass the output of an evaluation workflow to a Jupyter notebook, |
| 204 | +which enables more in-depth evaluations/assertions and visualizations. |
| 205 | +visualization. In general, this requires defining a template notebook |
| 206 | +(ideally separate notebooks for each test to have test-specific visualization), |
| 207 | +defining a `report` in carrot and mapping a template to the report. |
| 208 | +Please refer to [Carrot documentation for details.](https://github.com/broadinstitute/carrot/blob/48c58446d4fb044cabbdafe8962b67ee511b483a/UserGuide.md#-2-define-a-report-in-carrot) |
| 209 | +The `carrot_helper` does not currently support defining `report`. |
| 210 | + |
| 211 | + |
| 212 | +### Current limitations |
| 213 | + |
| 214 | +Carrot is under active development and new functionalities emerge |
| 215 | +as new versions are released. There are a few functionalities that are |
| 216 | +under development and not yet released that impact the workflows that |
| 217 | +can be tested using `carrot`. Specifically, Carrot does not currently |
| 218 | +support relative imports in WDL files (i.e., importing workflow via |
| 219 | +a WDL file is provided via the `--imports` argument of `cromwell`). |
| 220 | +A workaround to is to host required imports on a Google cloud storage |
| 221 | +bucket and import using the object's URL. However, this would require |
| 222 | +modifying all the WDLs of the GATK-SV pipeline. The carrot team is working |
| 223 | +on supporting an `--imports`-like functionality in carrot. |
| 224 | + |
| 225 | +Additionally, `carrot` do not currently support `Array` type outputs |
| 226 | +(e.g., `Array[File]`). In other words, the array type outputs of a |
| 227 | +test WDL cannot be passed to evaluation WDLs for assertions. A workaround |
| 228 | +is to encapsulate array output in a zip archive, hence the test WDL outputs |
| 229 | +a single file, and extract the content of zip in the eval WDL. This workaround |
| 230 | +would require a significant modification to GATK-SV pipeline workflows, hence |
| 231 | +we currently do not assert array type outputs. |
0 commit comments