@@ -84,242 +84,4 @@ Steps to run LENS workflow:
84843. Run the `lens.py` script with appropriate inputs
8585```
8686python3 local/iatlas/lens.py run --dataset_id <yaml-dataset-synapse-id > --s3_prefix s3://<your-s3-bucket >/<your-s3-subdirectory >
87- ```
88-
89- ## cbioportal_export
90-
91- ### Table of Contents
92-
93- - [cbioportal_export](#cbioportal_export)
94- - [Overview](#overview)
95- - [maf_to_cbioportal.py](#maf_to_cbioportalpy)
96- - [clinical_to_cbioportal.py](#clinical_to_cbioportalpy)
97- - [Setup](#setup-1)
98- - [How to Run](#how-to-run)
99- - [Outputs](#outputs)
100- - [maf_to_cbioportal.py](#maf_to_cbioportalpy-1)
101- - [clinical_to_cbioportal.py](#clinical_to_cbioportalpy-1)
102- - [General Workflow](#general-workflow)
103-
104- ### Overview
105-
106- #### maf.py
107- This script will run the iatlas mutations data through genome nexus so it can be ingested by cbioportal team for visualization.
108-
109- The script does the following:
110-
111- 1. Reads in and merges all the individual mafs from a given folder
112- 2. Splits the maf into smaller chunks for genome nexus annotation
113- 3. [Annotates via genome nexus](https://github.com/genome-nexus/genome-nexus-annotation-pipeline)
114- 4. Concatenates the results
115- 5. [Creates the required meta_* data](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-meta-files)
116-
117-
118- #### clinical.py
119- This script will process/transform the iatlas clinical data to be cbioportal format friendly so it can be ingested by cbioportal team for visualization.
120-
121- The script does the following:
122-
123- 1. Preprocesses the data and adds [required mappings like ONCOTREE or LENS_ID](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/oncotree-code-converter)
124- 2. [Adds clinical headers](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/add-clinical-header)
125- 3. [Creates the required meta_* data](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-meta-files)
126- 4. [Creates the required caselists](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-case-lists)
127- 5. [Validates the files for cbioportal](https://github.com/cBioPortal/cbioportal-core/blob/main/scripts/importer/validateData.py)
128-
129-
130- ### Setup
131- Prior to testing/developing/running this locally, you will need to setup the Docker image in order to run this.
132- Optional: You can also build your environment via python env and install the `uv.lock` file
133-
134- 1. Create and activate your venv
135-
136- ```
137- python3 -m venv <your_env_name>
138- source <your_env_name>/bin/activate
139- ```
140-
141- 2. Export dependencies from uv.lock
142-
143- ```
144- pip install uv
145- uv export > requirements.txt
146- ```
147-
148- 3. Install into your venv
149-
150- ```
151- pip install -r requirements.txt
152- ```
153-
154- But it is highly recommended you use the docker image
155-
156- 1. Build the dockerfile
157-
158- ```
159- cd /orca-recipes/local/iatlas/cbioportal_export
160- docker build -f Dockerfile -t <some_docker_name> .
161- ```
162-
163- 2. Run the Dockerfile
164-
165- ```
166- docker run --rm -it -e SYNAPSE_AUTH_TOKEN=$YOUR_SYNAPSE_TOKEN <some_docker_image_name>
167- ```
168-
169- 3. Follow the **How to Run** section below
170-
171- ### How to Run
172-
173- Getting help
174- ```
175- python3 clinical.py --help
176- ```
177-
178- ```
179- python3 maf.py --help
180- ```
181-
182- ```
183- python3 load.py --help
184- ```
185-
186- ### Outputs
187-
188- This pipeline generates the following key datasets that eventually get uploaded to synapse and ingested by cbioportal.
189- All datasets will be saved to:
190- `<datahub_tools_path>/add-clinical-header/<dataset_name>/` unless otherwise stated
191-
192- #### maf.py
193-
194- - `data_mutations_annotated.txt` – Annotated MAF file from genome nexus
195- - Generated by: `concatenate_mafs()`
196-
197- - `data_mutations_error_report.txt` – Error report from genome nexus
198- - Generated by: `genome_nexus`
199-
200- - `meta_mutations.txt` – Metadata file for mutations data
201- - Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
202-
203-
204- #### clinical.py
205-
206- - `data_clinical_patient.txt` – Clinical patient data file
207- - Generated by: `add_clinical_header()`
208-
209- - `data_clinical_sample.txt` – Clinical sample data file
210- - Generated by: `add_clinical_header()`
211-
212- - `meta_clinical_patient.txt` – Metadata file for clinical patient data file
213- - Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
214-
215- - `meta_clinical_sample.txt` – Metadata file for clinical sample data file
216- - Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
217-
218- - `meta_study.txt` – Metadata file for the entire study
219- - Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
220-
221- - `cases_<cancer_type>.txt` – case list files for each cancer type available in the clinical data
222- - `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
223- - Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
224-
225-
226- #### validate.py
227-
228- - `iatlas_validation_log.txt` - Validator results from our own iatlas validation results for all of the files
229- - Generated by: updated by each validation function
230-
231- - `cbioportal_validator_output.txt` – Validator results from cbioportal for all of the files not just clinical
232- - Generated by: `cbioportal`' validator code
233-
234- #### load.py
235-
236- - `cases_all.txt` – case list file for all the clinical samples in the study
237- - `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
238- - Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
239-
240- - `cases_sequenced.txt` – case list file containing the sequenced samples (mutation)
241- in the study
242- - `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
243- - Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
244-
245-
246- Any additional files are the intermediate processing files and can be ignored.
247-
248-
249- ### General Workflow
250-
251- 1. Run processing on the maf datasets via `maf.py`
252- 2. Run processing on the clinical datasets via `clinical.py`
253- 3. Run `load.py` to create case lists
254- 4. Run the general validation + cbioportal validator on your outputted files via `validate.py`
255- 5. Check your `cbioportal_validator_output.txt`
256- 6. Resolve any `ERROR`s
257- 7. Repeat steps 4-6 until all `ERROR`s are gone
258- 8. Run `load.py` now with the `upload` flag to upload to synapse
259-
260- **Example:**
261- Sample workflow
262-
263- Run clinical processing
264- ```
265- python3 clinical.py
266- --input_df_synid syn66314245 \
267- --cli_to_cbio_mapping_synid syn66276162
268- --cli_to_oncotree_mapping_synid syn66313842 \
269- --datahub_tools_path /<some_path>/datahub-study-curation-tools \
270- --lens_id_mapping_synid syn68826836
271- --neoantigen-data-synid syn21841882
272- ```
273-
274- Run maf processing
275- ```
276- python3 maf.py
277- --dataset Riaz
278- --input_folder_synid syn68785881
279- --datahub_tools_path /<some_path>/datahub-study-curation-tools
280- --n_workers 3
281- ```
282-
283- Create the case lists
284- ```
285- python3 load.py
286- --dataset Riaz
287- --output_folder_synid syn64136279
288- --datahub_tools_path /<some_path>/datahub-study-curation-tools
289- --create_case_lists
290- ```
291-
292- Run the general iatlas validation + cbioportal validator on all files
293- ```
294- python3 validate.py
295- --datahub_tools_path /<some_path>/datahub-study-curation-tools
296- --neoantigen_data_synid syn69918168
297- --cbioportal_path /<some_path>/cbioportal/
298- --dataset Riaz
299- ```
300-
301- Save into synapse with version comment `v1`
302-
303- ```
304- python3 load.py
305- --dataset Riaz
306- --output_folder_synid syn64136279
307- --datahub_tools_path /<some_path>/datahub-study-curation-tools
308- --version_comment "v1"
309- --upload
310- ```
311-
312- ### Running tests
313-
314- Tests are written via `pytest`.
315-
316- In your docker environment or local environment, install `pytest` via
317-
318- ```
319- pip install pytest
320- ```
321-
322- Then run all tests via
323- ```
324- python3 -m pytest tests
32587```
0 commit comments