This GitHub repository contains documents designed to demonstrate the process of a gold standard pipeline to students. By following this guide, you will learn how to generate reliable solutions to the questions by analyzing the data, involving regression models, performance validation, and discussion.
/
├── TCGA dataset/ # Input data files
TCGA_Acute_Myeloid_Leukemia_(LAML)/ # Dataset folder for Acute Myeloid Leukemia (LAML) cases, including both tumor samples and normal samples
├── TCGA.LAML.sampleMap_LAML_clinicalMatrix # Clinical data
├── TCGA.LAML.sampleMap_HiSeqV2_PANCAN.gz # Gene Expression data
...
├── gold_standard_pipeline.ipynb/ # Pipeline code file
├── Question/ # Question folder
└── README.md # This README file
- Research Questions: A set of research questions related to gene-cancer relationships.
- Dataset: Datasets including clinical data and gene expression data.
- Regression Model: Utilize a regression model to fit high-dimensional gene data, clinical data, and cancer diagnosis status to explore the relationship between genes and cancer.
- Discussion: Analyze and discuss the coefficients and significance of the regression model to draw conclusions related to the research questions.
If you encounter any issues or have questions while using your project, please feel free to open an issue in the GitHub repository.
To report an issue:
- Go to the Issues tab.
- Click the "New Issue" button.
- Provide a descriptive title and a detailed description of the problem or question you have encountered.
- Include any relevant information, such as error messages, screenshots, or steps to reproduce the issue.