Sage-Bionetworks-Workflows
diff --git a/‎local/iatlas/README.md‎
Lines changed: 0 additions & 238 deletions b/‎local/iatlas/README.md‎
Lines changed: 0 additions & 238 deletions
diff --git a/‎local/iatlas/cbioportal_export/Dockerfile‎
Lines changed: 0 additions & 26 deletions b/‎local/iatlas/cbioportal_export/Dockerfile‎
Lines changed: 0 additions & 26 deletions
@@ -84,242 +84,4 @@ Steps to run LENS workflow:
 3. Run the `lens.py` script with appropriate inputs
 ```
 python3 local/iatlas/lens.py run --dataset_id <yaml-dataset-synapse-id> --s3_prefix s3://<your-s3-bucket>/<your-s3-subdirectory>
-```
-
-## cbioportal_export
-
-### Table of Contents
-
-- [cbioportal_export](#cbioportal_export)
-  - [Overview](#overview)
-    - [maf_to_cbioportal.py](#maf_to_cbioportalpy)
-    - [clinical_to_cbioportal.py](#clinical_to_cbioportalpy)
-  - [Setup](#setup-1)
-  - [How to Run](#how-to-run)
-  - [Outputs](#outputs)
-    - [maf_to_cbioportal.py](#maf_to_cbioportalpy-1)
-    - [clinical_to_cbioportal.py](#clinical_to_cbioportalpy-1)
-  - [General Workflow](#general-workflow)
-
-### Overview
-
-#### maf.py
-This script will run the iatlas mutations data through genome nexus so it can be ingested by cbioportal team for visualization.
-
-The script does the following:
-
-1. Reads in and merges all the individual mafs from a given folder
-2. Splits the maf into smaller chunks for genome nexus annotation
-3. [Annotates via genome nexus](https://github.com/genome-nexus/genome-nexus-annotation-pipeline)
-4. Concatenates the results
-5. [Creates the required meta_* data](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-meta-files)
-
-
-#### clinical.py
-This script will process/transform the iatlas clinical data to be cbioportal format friendly so it can be ingested by cbioportal team for visualization.
-
-The script does the following:
-
-1. Preprocesses the data and adds [required mappings like ONCOTREE or LENS_ID](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/oncotree-code-converter)
-2. [Adds clinical headers](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/add-clinical-header)
-3. [Creates the required meta_* data](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-meta-files)
-4. [Creates the required caselists](https://github.com/cBioPortal/datahub-study-curation-tools/tree/master/generate-case-lists)
-5. [Validates the files for cbioportal](https://github.com/cBioPortal/cbioportal-core/blob/main/scripts/importer/validateData.py)
-
-
-### Setup
-Prior to testing/developing/running this locally, you will need to setup the Docker image in order to run this.
-Optional: You can also build your environment via python env and install the `uv.lock` file
-
-1. Create and activate your venv
-
-```
-python3 -m venv <your_env_name>
-source <your_env_name>/bin/activate
-```
-
-2. Export dependencies from uv.lock
-
-```
-pip install uv
-uv export > requirements.txt
-```
-
-3. Install into your venv
-
-```
-pip install -r requirements.txt
-```
-
-But it is highly recommended you use the docker image
-
-1. Build the dockerfile
-
-```
-cd /orca-recipes/local/iatlas/cbioportal_export
-docker build -f Dockerfile -t <some_docker_name> .
-```
-
-2. Run the Dockerfile
-
-```
-docker run --rm -it -e SYNAPSE_AUTH_TOKEN=$YOUR_SYNAPSE_TOKEN <some_docker_image_name>
-```
-
-3. Follow the **How to Run** section below
-
-### How to Run
-
-Getting help
-```
-python3 clinical.py --help
-```
-
-```
-python3 maf.py --help
-```
-
-```
-python3 load.py --help
-```
-
-### Outputs
-
-This pipeline generates the following key datasets that eventually get uploaded to synapse and ingested by cbioportal.
-All datasets will be saved to:
-`<datahub_tools_path>/add-clinical-header/<dataset_name>/` unless otherwise stated
-
-#### maf.py
-
-- `data_mutations_annotated.txt` – Annotated MAF file from genome nexus
-    - Generated by: `concatenate_mafs()`
-
-- `data_mutations_error_report.txt` – Error report from genome nexus
-    - Generated by: `genome_nexus`
-
-- `meta_mutations.txt` – Metadata file for mutations data
-    - Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
-
-
-#### clinical.py
-
-- `data_clinical_patient.txt` – Clinical patient data file
-    - Generated by: `add_clinical_header()`
-
-- `data_clinical_sample.txt` – Clinical sample data file
-    - Generated by: `add_clinical_header()`
-
-- `meta_clinical_patient.txt` – Metadata file for clinical patient data file
-    - Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
-
-- `meta_clinical_sample.txt` – Metadata file for clinical sample data file
-    - Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
-
-- `meta_study.txt` – Metadata file for the entire study
-    - Generated by: `datahub-study-curation-tools`' `generate-meta-files` code
-
-- `cases_<cancer_type>.txt` – case list files for each cancer type available in the clinical data
-    - `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
-    - Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
-
-
-#### validate.py
-
-- `iatlas_validation_log.txt` - Validator results from our own iatlas validation results for all of the files
-    - Generated by: updated by each validation function
-
-- `cbioportal_validator_output.txt` – Validator results from cbioportal for all of the files not just clinical
-    - Generated by: `cbioportal`' validator code
-
-#### load.py
-
-- `cases_all.txt` – case list file for all the clinical samples in the study
-    - `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
-    - Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
-
-- `cases_sequenced.txt` – case list file containing the sequenced samples (mutation)
-in the study
-    - `<datahub_tools_path>/add-clinical-header/<dataset_name>/case-lists/`
-    - Generated by: `datahub-study-curation-tools`' `generate-case-lists` code
-
-
-Any additional files are the intermediate processing files and can be ignored.
-
-
-### General Workflow
-
-1. Run processing on the maf datasets via `maf.py`
-2. Run processing on the clinical datasets via `clinical.py`
-3. Run `load.py` to create case lists
-4. Run the general validation + cbioportal validator on your outputted files via `validate.py`
-5. Check your `cbioportal_validator_output.txt`
-6. Resolve any `ERROR`s
-7. Repeat steps 4-6 until all `ERROR`s are gone
-8. Run `load.py` now with the `upload` flag to upload to synapse
-
-**Example:**
-Sample workflow
-
-Run clinical processing
-```
-python3 clinical.py 
-    --input_df_synid syn66314245 \
-    --cli_to_cbio_mapping_synid syn66276162 
-    --cli_to_oncotree_mapping_synid syn66313842 \
-    --datahub_tools_path /<some_path>/datahub-study-curation-tools \
-    --lens_id_mapping_synid syn68826836
-    --neoantigen-data-synid syn21841882
-```
-
-Run maf processing
-```
-python3 maf.py 
-    --dataset Riaz
-    --input_folder_synid syn68785881 
-    --datahub_tools_path /<some_path>/datahub-study-curation-tools 
-    --n_workers 3 
-```
-
-Create the case lists
-```
-python3 load.py 
-    --dataset Riaz  
-    --output_folder_synid syn64136279 
-    --datahub_tools_path /<some_path>/datahub-study-curation-tools  
-    --create_case_lists
-```
-
-Run the general iatlas validation + cbioportal validator on all files
-```
-python3 validate.py 
-    --datahub_tools_path /<some_path>/datahub-study-curation-tools 
-    --neoantigen_data_synid syn69918168 
-    --cbioportal_path /<some_path>/cbioportal/ 
-    --dataset Riaz  
-```
-
-Save into synapse with version comment `v1`
-
-```
-python3 load.py
-    --dataset Riaz  
-    --output_folder_synid syn64136279
-    --datahub_tools_path /<some_path>/datahub-study-curation-tools  
-    --version_comment "v1"
-    --upload
-```
-
-### Running tests
-
-Tests are written via `pytest`.
-
-In your docker environment or local environment, install `pytest` via
-
-```
-pip install pytest
-```
-
-Then run all tests via
-```
-python3 -m pytest tests
 ```