-
Notifications
You must be signed in to change notification settings - Fork 16
Data Analysis Modules: panoply_association
wcorinne edited this page Aug 25, 2025
·
3 revisions
Performs association analysis to identify differential markers for classes of interest using a moderated t-test (for binary classes) or F-test (for categorical multi-level classes). The significant markers are then ranked using a combination of p-values and variable importance in accurate classifiers.
Classes used for the analysis are derived from the annotations provided in the groups file, as long as each level in a class has at least 3 samples.
This module performs the following steps:
- Filters out (removes) features with >50% missing values, and imputes missing values using k-NN imputation.
- Runs marker selection using LIMMA.
- Uses statistically significant markers to build classifiers using PLS, PAM, GLMNET, and RF models.
- Extracts marker importance (a measure of how crucial the marker was for training an accurate classifier) for input markers and combines all the ranks into a global ranking using rank aggregation.
- If provided with test data, runs prediction using all classifiers.
- Plots a heatmap of training data showing significant markers.
- Runs GSEA for (binary) classes with only two levels.
Required inputs:
-
inputData: (.tarfile) tarball frompanoply_parse_sm_tableor other PANOPLY module;
(.gctfile) normalized/filtered input ifstandaloneisTRUE -
type: (String) (proteome) data type -
standalone: (String) set toTRUEto run as a self-contained module;
ifTRUEtheanalysisDirandgroupsFileinputs are required -
yaml: (.yamlfile) parameters inyamlformat
Optional inputs:
-
analysisDir: (String) name of analysis directory -
groupsFile: (.csvfile) subset of sample annotations, providing classes for association analysis -
fdr_assoc: (Float, default = 0.01) FDR cutoff value for significance -
sample_na_max: (Float, default = 0.8) maximum allowed fraction of NA values per sample/column; error if violated. -
nmiss_factor: (Float, default = 0.5) features (genes, proteins, PTM sites) with more thannmiss_factorfraction of NA values will be removed from the analysis -
duplicate_gene_policy: (String, default = 'maxvar') method used to combine duplicate genes (when mapping protein accession or PTM site to gene symbols) for running GSEA; possible options are:- maxvar: select row with largest variance
- union: union of binary (0/1) values in all rows (e.g for mutation status)
- median: median of values in all rows (for each column/sample)
- mean: mean of values in all rows (for each column/sample)
- min: minimum of values in all rows (for each column/sample)
-
gene_id_col: (String, default = 'geneSymbol') name of sample annotation column containing gene ids. -
outFile: (String, default = "panoply_association-output.tar") output.tarfile name
-
outputsTarball of files containing the following in theassociationsubdirectory, for each class vector considered for association analysis:- List of significant differential markers derived using LIMMA (
*-markers-all-fdr*.csv) and p-values for all input features (*-markers-all*.csv) - Marker importance for significant markers, along with final rank (
*-markerimp-fdr*.csv) - Heatmap of significant differential markers (
*-markers-heatmap.pdf) - Classifier performance contingency tables (
*-analysis-model-results.txt) - Table of prediction results for training data (
*-train-results-*.csv) and testing data (*-test-results.csv) using all classifiers - GSEA outputs, along with
.gct. and.clsinput files, for binary classes (in*-gsea-analysis/subdirectory).
- List of significant differential markers derived using LIMMA (
- Home
- PANOPLY Tutorial
- Data Preparation Modules
-
Data Analysis Modules
- panoply_association
- panoply_blacksheep
- panoply_clumps_ptm_diffexp
- panoply_clumps_ptm
- panoply_clumps_ptm_postprocess
- panoply_cmap_analysis
- panoply_cna_correlation
- panoply_cons_clust
- panoply_immune_analysis
- panoply_metaboanalyst
- panoply_mimp
- panoply_nmf
- panoply_nmf_postprocess
- panoply_omicsev
- panoply_quilts
- panoply_rna_protein_correlation
- panoply_sankey
- panoply_ssgsea
-
Report Modules
- panoply_association_report
- panoply_blacksheep_report
- panoply_clumps_ptm_report
- panoply_cna_correlation_report
- panoply_cons_clust_report
- panoply_immune_analysis_report
- panoply_metaboanalyst_report
- panoply_mimp_report
- panoply_nmf_report
- panoply_normalize_ms_data_report
- panoply_rna_protein_correlation_report
- panoply_sampleqc_report
- panoply_sankey_report
- panoply_ssgsea_report
- Support Modules
- Navigating Results
- PANOPLY without Terra
- Customizing PANOPLY
-
Workflows
- panoply_association_workflow
- panoply_blacksheep_workflow
- panoply_clumps_ptm_workflow
- panoply_immune_analysis_workflow
- panoply_metaboanalyst_workflow
- panoply_nmf_workflow
- panoply_nmf_internal_workflow
- panoply_normalize_filter_workflow
- panoply_process_SM_table
- panoply_sankey_workflow
- panoply_ssgsea_workflow
- Pipelines