Data Analysis Modules: panoply_association

`panoply_association`

Description

Performs association analysis to identify differential markers for classes of interest using a moderated t-test (for binary classes) or F-test (for categorical multi-level classes). The significant markers are then ranked using a combination of p-values and variable importance in accurate classifiers.

Classes used for the analysis are derived from the annotations provided in the groups file, as long as each level in a class has at least 3 samples.

This module performs the following steps:

Filters out (removes) features with >50% missing values, and imputes missing values using k-NN imputation.
Runs marker selection using LIMMA.
Uses statistically significant markers to build classifiers using PLS, PAM, GLMNET, and RF models.
Extracts marker importance (a measure of how crucial the marker was for training an accurate classifier) for input markers and combines all the ranks into a global ranking using rank aggregation.
If provided with test data, runs prediction using all classifiers.
Plots a heatmap of training data showing significant markers.
Runs GSEA for (binary) classes with only two levels.

Input

Required inputs:

inputData: (.tar file) tarball from panoply_parse_sm_table or other PANOPLY module;
(.gct file) normalized/filtered input if standalone is TRUE
type: (String) (proteome) data type
standalone: (String) set to TRUE to run as a self-contained module;
if TRUE the analysisDir and groupsFile inputs are required
yaml: (.yaml file) parameters in yaml format

Optional inputs:

analysisDir: (String) name of analysis directory
groupsFile: (.csv file) subset of sample annotations, providing classes for association analysis
fdr_assoc: (Float, default = 0.01) FDR cutoff value for significance
sample_na_max: (Float, default = 0.8) maximum allowed fraction of NA values per sample/column; error if violated.
nmiss_factor: (Float, default = 0.5) features (genes, proteins, PTM sites) with more than nmiss_factor fraction of NA values will be removed from the analysis
duplicate_gene_policy: (String, default = 'maxvar') method used to combine duplicate genes (when mapping protein accession or PTM site to gene symbols) for running GSEA; possible options are:
- maxvar: select row with largest variance
- union: union of binary (0/1) values in all rows (e.g for mutation status)
- median: median of values in all rows (for each column/sample)
- mean: mean of values in all rows (for each column/sample)
- min: minimum of values in all rows (for each column/sample)
gene_id_col: (String, default = 'geneSymbol') name of sample annotation column containing gene ids.
outFile: (String, default = "panoply_association-output.tar") output .tar file name

Output

outputs Tarball of files containing the following in the association subdirectory, for each class vector considered for association analysis:
- List of significant differential markers derived using LIMMA (*-markers-all-fdr*.csv) and p-values for all input features (*-markers-all*.csv)
- Marker importance for significant markers, along with final rank (*-markerimp-fdr*.csv)
- Heatmap of significant differential markers (*-markers-heatmap.pdf)
- Classifier performance contingency tables (*-analysis-model-results.txt)
- Table of prediction results for training data (*-train-results-*.csv) and testing data (*-test-results.csv) using all classifiers
- GSEA outputs, along with .gct. and .cls input files, for binary classes (in *-gsea-analysis/ subdirectory).

Data Analysis Modules: panoply_association

panoply_association

Description

Input

Output

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

`panoply_association`