Skip to content

Commit 9b39522

Browse files
authored
remove cbioportal_export code from iatlas as it's been moved to iatlas-cbioportal-export repo (#126)
1 parent 69d1fe6 commit 9b39522

File tree

13 files changed

+0
-4126
lines changed

13 files changed

+0
-4126
lines changed

local/iatlas/README.md

Lines changed: 0 additions & 238 deletions
Original file line numberDiff line numberDiff line change
@@ -84,242 +84,4 @@ Steps to run LENS workflow:
8484
3. Run the `lens.py` script with appropriate inputs
8585
```
8686
python3 local/iatlas/lens.py run --dataset_id <yaml-dataset-synapse-id> --s3_prefix s3://<your-s3-bucket>/<your-s3-subdirectory>
87-
```
88-
89-
## cbioportal_export
90-
91-
### Table of Contents
92-
93-
- [cbioportal_export](#cbioportal_export)
94-
- [Overview](#overview)
95-
- [maf_to_cbioportal.py](#maf_to_cbioportalpy)
96-
- [clinical_to_cbioportal.py](#clinical_to_cbioportalpy)
97-
- [Setup](#setup-1)
98-
- [How to Run](#how-to-run)
99-
- [Outputs](#outputs)
100-
- [maf_to_cbioportal.py](#maf_to_cbioportalpy-1)
101-
- [clinical_to_cbioportal.py](#clinical_to_cbioportalpy-1)
102-
- [General Workflow](#general-workflow)
103-
104-
### Overview
105-
106-
#### maf.py
107-
This script will run the iatlas mutations data through genome nexus so it can be ingested by cbioportal team for visualization.
108-
109-
The script does the following:
110-
111-
1. Reads in and merges all the individual mafs from a given folder
112-
2. Splits the maf into smaller chunks for genome nexus annotation
113-
3. [Annotates via genome nexus](https://github.com/genome-nexus/genome-nexus-annotation-pipeline)
114-
4. Concatenates the results
115-
5. [Creates the required meta_* data](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-meta-files)
116-
117-
118-
#### clinical.py
119-
This script will process/transform the iatlas clinical data to be cbioportal format friendly so it can be ingested by cbioportal team for visualization.
120-
121-
The script does the following:
122-
123-
1. Preprocesses the data and adds [required mappings like ONCOTREE or LENS_ID](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/oncotree-code-converter)
124-
2. [Adds clinical headers](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/add-clinical-header)
125-
3. [Creates the required meta_* data](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-meta-files)
126-
4. [Creates the required caselists](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-case-lists)
127-
5. [Validates the files for cbioportal](https://github.com/cBioPortal/cbioportal-core/blob/main/scripts/importer/validateData.py)
128-
129-
130-
### Setup
131-
Prior to testing/developing/running this locally, you will need to setup the Docker image in order to run this.
132-
Optional: You can also build your environment via python env and install the `uv.lock` file
133-
134-
1. Create and activate your venv
135-
136-
```
137-
python3 -m venv <your_env_name>
138-
source <your_env_name>/bin/activate
139-
```
140-
141-
2. Export dependencies from uv.lock
142-
143-
```
144-
pip install uv
145-
uv export > requirements.txt
146-
```
147-
148-
3. Install into your venv
149-
150-
```
151-
pip install -r requirements.txt
152-
```
153-
154-
But it is highly recommended you use the docker image
155-
156-
1. Build the dockerfile
157-
158-
```
159-
cd /orca-recipes/local/iatlas/cbioportal_export
160-
docker build -f Dockerfile -t <some_docker_name> .
161-
```
162-
163-
2. Run the Dockerfile
164-
165-
```
166-
docker run --rm -it -e SYNAPSE_AUTH_TOKEN=$YOUR_SYNAPSE_TOKEN <some_docker_image_name>
167-
```
168-
169-
3. Follow the **How to Run** section below
170-
171-
### How to Run
172-
173-
Getting help
174-
```
175-
python3 clinical.py --help
176-
```
177-
178-
```
179-
python3 maf.py --help
180-
```
181-
182-
```
183-
python3 load.py --help
184-
```
185-
186-
### Outputs
187-
188-
This pipeline generates the following key datasets that eventually get uploaded to synapse and ingested by cbioportal.
189-
All datasets will be saved to:
190-
`<datahub_tools_path>/add-clinical-header/<dataset_name>/` unless otherwise stated
191-
192-
#### maf.py
193-
194-
- `data_mutations_annotated.txt` – Annotated MAF file from genome nexus
195-
- Generated by: `concatenate_mafs()`
196-
197-
- `data_mutations_error_report.txt` – Error report from genome nexus
198-
- Generated by: `genome_nexus`
199-
200-
- `meta_mutations.txt` – Metadata file for mutations data
201-
- Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
202-
203-
204-
#### clinical.py
205-
206-
- `data_clinical_patient.txt` – Clinical patient data file
207-
- Generated by: `add_clinical_header()`
208-
209-
- `data_clinical_sample.txt` – Clinical sample data file
210-
- Generated by: `add_clinical_header()`
211-
212-
- `meta_clinical_patient.txt` – Metadata file for clinical patient data file
213-
- Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
214-
215-
- `meta_clinical_sample.txt` – Metadata file for clinical sample data file
216-
- Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
217-
218-
- `meta_study.txt` – Metadata file for the entire study
219-
- Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
220-
221-
- `cases_<cancer_type>.txt` – case list files for each cancer type available in the clinical data
222-
- `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
223-
- Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
224-
225-
226-
#### validate.py
227-
228-
- `iatlas_validation_log.txt` - Validator results from our own iatlas validation results for all of the files
229-
- Generated by: updated by each validation function
230-
231-
- `cbioportal_validator_output.txt` – Validator results from cbioportal for all of the files not just clinical
232-
- Generated by: `cbioportal`' validator code
233-
234-
#### load.py
235-
236-
- `cases_all.txt` – case list file for all the clinical samples in the study
237-
- `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
238-
- Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
239-
240-
- `cases_sequenced.txt` – case list file containing the sequenced samples (mutation)
241-
in the study
242-
- `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
243-
- Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
244-
245-
246-
Any additional files are the intermediate processing files and can be ignored.
247-
248-
249-
### General Workflow
250-
251-
1. Run processing on the maf datasets via `maf.py`
252-
2. Run processing on the clinical datasets via `clinical.py`
253-
3. Run `load.py` to create case lists
254-
4. Run the general validation + cbioportal validator on your outputted files via `validate.py`
255-
5. Check your `cbioportal_validator_output.txt`
256-
6. Resolve any `ERROR`s
257-
7. Repeat steps 4-6 until all `ERROR`s are gone
258-
8. Run `load.py` now with the `upload` flag to upload to synapse
259-
260-
**Example:**
261-
Sample workflow
262-
263-
Run clinical processing
264-
```
265-
python3 clinical.py
266-
--input_df_synid syn66314245 \
267-
--cli_to_cbio_mapping_synid syn66276162
268-
--cli_to_oncotree_mapping_synid syn66313842 \
269-
--datahub_tools_path /<some_path>/datahub-study-curation-tools \
270-
--lens_id_mapping_synid syn68826836
271-
--neoantigen-data-synid syn21841882
272-
```
273-
274-
Run maf processing
275-
```
276-
python3 maf.py
277-
--dataset Riaz
278-
--input_folder_synid syn68785881
279-
--datahub_tools_path /<some_path>/datahub-study-curation-tools
280-
--n_workers 3
281-
```
282-
283-
Create the case lists
284-
```
285-
python3 load.py
286-
--dataset Riaz
287-
--output_folder_synid syn64136279
288-
--datahub_tools_path /<some_path>/datahub-study-curation-tools
289-
--create_case_lists
290-
```
291-
292-
Run the general iatlas validation + cbioportal validator on all files
293-
```
294-
python3 validate.py
295-
--datahub_tools_path /<some_path>/datahub-study-curation-tools
296-
--neoantigen_data_synid syn69918168
297-
--cbioportal_path /<some_path>/cbioportal/
298-
--dataset Riaz
299-
```
300-
301-
Save into synapse with version comment `v1`
302-
303-
```
304-
python3 load.py
305-
--dataset Riaz
306-
--output_folder_synid syn64136279
307-
--datahub_tools_path /<some_path>/datahub-study-curation-tools
308-
--version_comment "v1"
309-
--upload
310-
```
311-
312-
### Running tests
313-
314-
Tests are written via `pytest`.
315-
316-
In your docker environment or local environment, install `pytest` via
317-
318-
```
319-
pip install pytest
320-
```
321-
322-
Then run all tests via
323-
```
324-
python3 -m pytest tests
32587
```

local/iatlas/cbioportal_export/Dockerfile

Lines changed: 0 additions & 26 deletions
This file was deleted.

0 commit comments

Comments
 (0)