-
Notifications
You must be signed in to change notification settings - Fork 16
Support Modules: panoply_nmf_balance_omes
To mitigate the impact of a potential bias towards a particular data type in the multi-omics clustering (i.e. vastly different number of genomic and proteomic features), the following filtering approach is applied:
- Concatenate data matrices and remove all rows containing missing values
- Standardize the resulting matrix by z-scoring the rows followed by z-scoring of columns
- Apply principal component analysis (PCA) to the resulting standardized multi-omic data matrix.
- Based on the factors matrix, determine the number of principle components (PCs) explaining 90% of total variance in the data matrix (PCs90)
- Based on the loadings-matrix, calculate the relative contribution of each feature to each PCs90 (equivalent to squared cosine described in (Abdi and Williams, 2010)
- For each feature calculate relative, cumulative contributions across all PCs90
- The resulting vector of relative contributions of each feature (i.e. vector sums up to 1) is then used to balance the contribution of the different data types using the following procedure:
- For each data type sum up the contributions of all features; this determines the overall contribution of each data type, which ideally should be equal across the data types within a given tolerance (parameter
$tol), i.e.:
sumome≈1/(No. data types) - Remove the feature with the lowest contribution that belongs to the data type with the largest overall contribution
- Recalculate the overall contributions of each data type and repeat steps 1-2 until the deviation is within the specified tolerance (default:
tol=0.01).
- For each data type sum up the contributions of all features; this determines the overall contribution of each data type, which ideally should be equal across the data types within a given tolerance (parameter
The results of this balancing approach are visualized in the file balance_omes_pdf returned by module panoply_nmf_balance_omes.
-
label: (String) name for output tar file -
ome_gcts: (Array[File]+) array of normalized data matrices (e.g. proteome, phosphoproteome, RNA, CNA, etc.) in.gctformat. -
ome_labels: (Array[String]+) array of labels associated with each gct file (e.g. "prot", "pSTY", "rna', "cna", etc.). Must match the length and order ofome_gctexactly. -
tol: (Float, default = 0.01) Tolerance specifying the maximal accepted difference (as a fraction of total variance) between contributions from different data types. Used as stopping criterion to end optimization. -
var: (Float, default = 0.9) Explained variance by PCA (between 0-1). Used to extract the number of PCs explaining the specified fraction of variance in the multi-omics data matrix. -
zscore_mode: (String, default = "rowcol") z-score mode:row(z-score rows),col(z-score columns),rowcol(z-score rows and then columns). Note that z-scoring can also be performed directly in the panoply_nmf module.
-
ome_gcts_balanced: Array[File]+ array of balance data-matrices in.gctformat for input into the panoply_nmf module. Mmaintains order ofome_gcts. -
pdf: (File) Visualization of the filtering approach to balance the contribution of the data types.
- Home
- PANOPLY Tutorial
- Data Preparation Modules
-
Data Analysis Modules
- panoply_association
- panoply_blacksheep
- panoply_clumps_ptm_diffexp
- panoply_clumps_ptm
- panoply_clumps_ptm_postprocess
- panoply_cmap_analysis
- panoply_cna_correlation
- panoply_cons_clust
- panoply_immune_analysis
- panoply_metaboanalyst
- panoply_mimp
- panoply_nmf
- panoply_nmf_postprocess
- panoply_omicsev
- panoply_quilts
- panoply_rna_protein_correlation
- panoply_sankey
- panoply_ssgsea
-
Report Modules
- panoply_association_report
- panoply_blacksheep_report
- panoply_clumps_ptm_report
- panoply_cna_correlation_report
- panoply_cons_clust_report
- panoply_immune_analysis_report
- panoply_metaboanalyst_report
- panoply_mimp_report
- panoply_nmf_report
- panoply_normalize_ms_data_report
- panoply_rna_protein_correlation_report
- panoply_sampleqc_report
- panoply_sankey_report
- panoply_ssgsea_report
- Support Modules
- Navigating Results
- PANOPLY without Terra
- Customizing PANOPLY
-
Workflows
- panoply_association_workflow
- panoply_blacksheep_workflow
- panoply_clumps_ptm_workflow
- panoply_immune_analysis_workflow
- panoply_metaboanalyst_workflow
- panoply_nmf_workflow
- panoply_nmf_internal_workflow
- panoply_normalize_filter_workflow
- panoply_process_SM_table
- panoply_sankey_workflow
- panoply_ssgsea_workflow
- Pipelines