-
Notifications
You must be signed in to change notification settings - Fork 507
New tool: FastSpar a tool for correlation estimation for compositional data. #7059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
d0dc087
Add fastspar tool
neo417 09a3d36
Move most fastspar parameters into macros file to reuse for fastspar_…
neo417 1126447
Add min and max values to numeric parameters and fix their attribute …
neo417 8c85ed8
Remove the otu_table parameter from the macro to make the tool more r…
neo417 c34f016
Add tool suite options to .shed.yml
neo417 5397937
Create profile token and set target version to 23.0
neo417 614b71e
Explain --yes fastspar parameter to skip the warning prompt.
neo417 3b40ea9
Detect files with 'from_work_dir'
neo417 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| name: fastspar | ||
| owner: iuc | ||
| description: Tool for rapid and scalable correlation estimation for compositional data. | ||
| homepage_url: https://github.com/scwatts/fastspar/ | ||
| long_description: FastSpar is a C++ implementation of the SparCC algorithm for the inference of interaction networks from sparse and compositional data. It rapidly infers correlation networks and calculates P-values using an unbiased estimator. | ||
| remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/fastspar | ||
| categories: | ||
| - Metagenomics | ||
| - Statistics | ||
| auto_tool_repositories: | ||
| name_template: "{{ tool_id }}" | ||
| description_template: "Wrapper for fastspar function: {{ tool_name }}." | ||
| suite: | ||
| name: "suite_fastspar" | ||
| description: "A suite for rapid and scalable correlation estimation for compositional data." | ||
| long_description: "FastSpar is a C++ implementation of the SparCC algorithm for the inference of interaction networks from sparse and compositional data. It rapidly infers correlation networks and calculates P-values using an unbiased estimator." |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| <tool id="fastspar" name="FastSpar" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | ||
| <description> | ||
| correlation estimation for compositional data | ||
| </description> | ||
| <macros> | ||
| <import>macros.xml</import> | ||
| </macros> | ||
| <expand macro="biotools"/> | ||
| <expand macro="requirements"/> | ||
| <command detect_errors="exit_code"><![CDATA[ | ||
| fastspar | ||
| --otu_table '$otu_table' | ||
| --iterations $iterations | ||
| --exclude_iterations $exclude_iterations | ||
| --threshold $threshold | ||
| --seed $seed | ||
| --correlation '$correlation' | ||
| --covariance '$covariance' | ||
| --threads \${GALAXY_SLOTS:-1} | ||
| ## Skip warning prompt and continue analysis even if the input contains OTUs with just one permutation. | ||
| --yes | ||
| ]]></command> | ||
| <inputs> | ||
| <param argument="--otu_table" type="data" format="tabular" label="Input OTU table" | ||
| help="The table must contain absolute OTU counts in plain tabular (TSV) format, with OTUs as rows and samples as columns. Do not include any metadata rows or columns."/> | ||
| <expand macro="fastspar_tool_parameters"/> | ||
| <param argument="--seed" type="integer" value="1" label="Random number seed"/> | ||
| </inputs> | ||
| <outputs> | ||
| <data name="correlation" format="tabular" label="${tool.name} on ${on_string}: median_correlation.tsv"/> | ||
| <data name="covariance" format="tabular" label="${tool.name} on ${on_string}: median_covariance.tsv"/> | ||
| </outputs> | ||
| <tests> | ||
| <test expect_num_outputs="2"> | ||
| <param name="otu_table" ftype="tabular" value="fake_data.tsv"/> | ||
| <output name="correlation" file="fake_data_cor.tsv" compare="diff"/> | ||
| <output name="covariance" file="fake_data_cov.tsv" compare="diff"/> | ||
| </test> | ||
| <test expect_num_outputs="2"> | ||
| <param name="otu_table" ftype="tabular" value="fake_data.tsv"/> | ||
| <param name="exclude_iterations" value="20"/> | ||
| <param name="threshold" value="0.2"/> | ||
| <output name="correlation" ftype="tabular"> | ||
| <assert_contents> | ||
| <has_n_columns n="51"/> | ||
| <has_text text="1.0000"/> | ||
| </assert_contents> | ||
| </output> | ||
| <output name="covariance" ftype="tabular"> | ||
| <assert_contents> | ||
| <has_n_columns n="51"/> | ||
| <has_text text="OTU ID"/> | ||
| </assert_contents> | ||
| </output> | ||
| </test> | ||
| </tests> | ||
| <help><![CDATA[ | ||
| What it does | ||
| ============ | ||
|
|
||
| FastSpar is a C++ implementation of the SparCC algorithm for estimating correlations from compositional data. | ||
| This tool performs the **initial correlation and covariance matrix estimation** as the first step in the FastSpar pipeline. | ||
| **If you also want to estimate p-values** you might want to use `fastspar_pvalues` with "Recalculate the correlation matrix". | ||
|
|
||
| Required Inputs | ||
| =============== | ||
|
|
||
| - **OTU table** (TSV format): Contains absolute OTU counts (not relative abundances). Must be a plain tabular file with samples in columns and OTUs in rows. Metadata is not supported. | ||
|
|
||
| Main Parameters | ||
| =============== | ||
|
|
||
| - **Iterations** (`--iterations`): Number of correlation estimation rounds. More iterations improve stability. | ||
| - **Exclude iterations** (`--exclude_iterations`): Number of times highly correlated OTU pairs are removed. | ||
| - **Correlation threshold** (`--threshold`): Correlation strength above which to exclude OTU pairs. | ||
| - **Seed** (`--seed`): Random seed for reproducibility. | ||
|
|
||
| Main Features | ||
| =============== | ||
|
|
||
| - Efficient and fast computation of sparse correlations. | ||
| - Customizable exclusion and thresholding strategy. | ||
| - Designed to handle compositional count data from microbiome studies. | ||
|
|
||
| Generated Outputs | ||
| ================= | ||
|
|
||
| - `median_correlation.tsv`: Correlation matrix between all OTUs. | ||
| - `median_covariance.tsv`: Covariance matrix between all OTUs. | ||
|
|
||
| Additional Resources | ||
| ==================== | ||
|
|
||
| - FastSpar GitHub: [https://github.com/scwatts/fastspar] | ||
|
|
||
| For a complete FastSpar analysis, follow up with: | ||
|
|
||
| 1. `fastspar_pvalues`: Estimate empirical p-values from bootstrap correlations. | ||
| 2. `fastspar_reduce`: Filter correlation and p-value matrices to produce sparse networks. | ||
| ]]></help> | ||
| <expand macro="citations"/> | ||
| </tool> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,158 @@ | ||
| <tool id="fastspar_pvalues" name="FastSpar: estimate p-values" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | ||
| <description> | ||
| Bootstrap-based estimation of p-values from FastSpar correlations | ||
| </description> | ||
| <macros> | ||
| <import>macros.xml</import> | ||
| </macros> | ||
| <expand macro="biotools"/> | ||
| <expand macro="requirements_pvalues"/> | ||
| <command detect_errors="exit_code"><![CDATA[ | ||
| #if $correlation.select == "new" | ||
| fastspar | ||
| --otu_table '$otu_table' | ||
| --iterations $iterations | ||
| --exclude_iterations $exclude_iterations | ||
| --threshold $threshold | ||
| --seed $seed | ||
| --correlation '$output_correlation' | ||
| --covariance '$output_covariance' | ||
| --threads \${GALAXY_SLOTS:-1} | ||
| ## Skip warning prompt and continue analysis even if the input contains OTUs with just one permutation. | ||
| --yes && | ||
| #set $correlation_file = $output_correlation | ||
| #else | ||
| #set $correlation_file = $correlation.input_file | ||
| #end if | ||
|
|
||
| mkdir bootstrap_counts | ||
| && fastspar_bootstrap | ||
| --otu_table '$otu_table' | ||
| --number $number | ||
| --prefix bootstrap_counts/data | ||
| --seed $seed | ||
| --threads \${GALAXY_SLOTS:-1} | ||
|
|
||
| && mkdir bootstrap_correlation | ||
| && parallel | ||
| --max-procs \${GALAXY_SLOTS:-1} | ||
| fastspar | ||
| --otu_table {} | ||
| --correlation bootstrap_correlation/cor_{/} | ||
| --covariance bootstrap_correlation/cov_{/} | ||
| --iterations $iterations | ||
| --exclude_iterations $exclude_iterations | ||
| --threshold $threshold | ||
| --seed $seed | ||
| ::: bootstrap_counts/* | ||
|
|
||
| && fastspar_pvalues | ||
| --otu_table '$otu_table' | ||
| --correlation '$correlation_file' | ||
| --prefix bootstrap_correlation/cor_data_ | ||
| --permutations $number | ||
| $pseudo | ||
| --threads \${GALAXY_SLOTS:-1} | ||
| --outfile '$pvalues' | ||
| ]]></command> | ||
| <inputs> | ||
| <param argument="--otu_table" type="data" format="tabular" label="Input OTU table" | ||
| help="The table must contain absolute OTU counts in plain tabular (TSV) format, with OTUs as rows and samples as columns. Do not include any metadata rows or columns."/> | ||
| <conditional name="correlation"> | ||
| <param name="select" type="select" label="Tested correlation matrix" | ||
| help="For meaningful p-values, the parameters used during bootstrapped correlation estimation should be identical to those used for the FastSpar run which produced the correlation matrix. <br>For your convenience you can choose to calculate the correlation matrix here. In that case the seed used for the calculation is the same one used for generating the bootstrapped samples."> | ||
| <option value="new">Recalculate the correlation matrix</option> | ||
| <option value="original">Use an existing correlation matrix</option> | ||
| </param> | ||
| <when value="new"/> | ||
| <when value="original"> | ||
| <param name="input_file" type="data" format="tabular" argument="--correlation" label="Correlation table" help="The correlation matrix generated by the original FastSpar analysis."/> | ||
| </when> | ||
| </conditional> | ||
| <param argument="--number" type="integer" min="10" max="10000" value="1000" label="Number of bootstrap samples" help="Recommended minimum: 1000 bootstrap samples for robust estimation."/> | ||
| <expand macro="fastspar_tool_parameters"/> | ||
| <param argument="--seed" type="integer" value="1" label="Seed to ensure reproducibility of bootstrapped samples."/> | ||
| <param argument="--pseudo" type="boolean" truevalue="--pseudo" falsevalue="" label="Use pseudo p-values" help="If selected, pseudo p-values are calculated instead of exact p-values. This can provide faster estimates but may be less precise."/> | ||
| </inputs> | ||
| <outputs> | ||
| <data name="output_correlation" format="tabular" label="${tool.name} on ${on_string}: median_correlation.tsv"> | ||
| <filter>correlation['select'] == "new"</filter> | ||
| </data> | ||
| <data name="output_covariance" format="tabular" label="${tool.name} on ${on_string}: median_covariance.tsv"> | ||
| <filter>correlation['select'] == "new"</filter> | ||
| </data> | ||
| <data name="pvalues" format="tabular" label="${tool.name} on ${on_string}: pvalues.tsv"/> | ||
| </outputs> | ||
| <tests> | ||
| <test expect_num_outputs="1"> | ||
| <param name="otu_table" ftype="tabular" value="fake_data.tsv"/> | ||
| <conditional name="correlation"> | ||
| <param name="select" value="original"/> | ||
| <param name="input_file" ftype="tabular" value="fake_data_cor.tsv"/> | ||
| </conditional> | ||
| <param name="number" value="10"/> | ||
| <output name="pvalues" file="fake_pvalues.tsv" compare="diff"/> | ||
| </test> | ||
| <test expect_num_outputs="3"> | ||
| <param name="otu_table" ftype="tabular" value="fake_data.tsv"/> | ||
| <conditional name="correlation"> | ||
| <param name="select" value="new"/> | ||
| </conditional> | ||
| <param name="number" value="10"/> | ||
| <output name="output_correlation" file="fake_data_cor.tsv" compare="diff"/> | ||
| <output name="output_covariance" file="fake_data_cov.tsv" compare="diff"/> | ||
| <output name="pvalues" file="fake_pvalues.tsv" compare="diff"/> | ||
| </test> | ||
| </tests> | ||
| <help><![CDATA[ | ||
| What it does | ||
| ============ | ||
|
|
||
| This tool estimates **empirical p-values** for correlation values generated by FastSpar. It uses a **bootstrap-based permutation approach** to assess the statistical significance of observed correlations. | ||
|
|
||
| You can choose to recalculate the correlation matrix with the same parameters or use an existing correlation matrix. | ||
|
|
||
| How it works | ||
| ============ | ||
|
|
||
| 1. Generates multiple bootstrapped versions of the OTU table. | ||
| 2. Runs FastSpar on each bootstrap replicate. | ||
| 3. Compares bootstrapped correlations to the original correlation matrix to calculate empirical p-values. | ||
|
|
||
| Required Inputs | ||
| =============== | ||
|
|
||
| - **OTU table**: TSV file with absolute counts (no metadata). | ||
| - **Correlation table** (optional): Output from the original FastSpar run. | ||
| - **Bootstrap samples**: Number of bootstrap replicates (≥1000 recommended). | ||
|
|
||
| Important Parameters | ||
| ==================== | ||
|
|
||
| - **Iterations**: Must match the number used in the original FastSpar run. | ||
| - **Exclude Iterations** and **Threshold**: Should also match the original settings, if used. | ||
| - **Seed**: Optional, for reproducibility. | ||
| - **Pseudo**: Choose whether to calculate pseudo p-values instead of exact values. | ||
|
|
||
| IMPORTANT | ||
| ========= | ||
|
|
||
| For meaningful p-values, the parameters used during bootstrapped correlation estimation (**iterations, exclude iterations, threshold**) should be identical to those used in the original FastSpar run. | ||
|
|
||
| Output | ||
| ====== | ||
|
|
||
| - `pvalues.tsv`: A table of empirical p-values for all pairwise correlations. | ||
|
|
||
| When "Recalculate the correlation matrix" is selected the tool will also output: | ||
|
|
||
| - `median_correlation.tsv`: Correlation matrix between all OTUs. | ||
| - `median_covariance.tsv`: Covariance matrix between all OTUs. | ||
|
|
||
| Additional Resources | ||
| ==================== | ||
|
|
||
| - FastSpar GitHub: https://github.com/scwatts/fastspar | ||
| ]]></help> | ||
| <expand macro="citations"/> | ||
| </tool> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,82 @@ | ||
| <tool id="fastspar_reduce" name="FastSpar: Reduce correlation table" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@"> | ||
| <description> | ||
| Filter correlation and p-value table into sparse matrices | ||
| </description> | ||
| <macros> | ||
| <import>macros.xml</import> | ||
| </macros> | ||
| <expand macro="biotools"/> | ||
| <expand macro="requirements"/> | ||
| <command detect_errors="exit_code"><![CDATA[ | ||
| fastspar_reduce | ||
| --correlation_table '$correlation_table' | ||
| --pvalue_table '$pvalue_table' | ||
| --correlation $correlation | ||
| --pvalue $pvalue | ||
| --output_prefix sparse | ||
| ]]></command> | ||
| <inputs> | ||
| <param argument="--correlation_table" type="data" format="tabular" label="Correlation table"/> | ||
| <param argument="--pvalue_table" type="data" format="tabular" label="P-value table"/> | ||
| <param argument="--correlation" type="float" min="0.0" max="1.0" value="0.10" label="Absolute correlation threshold"/> | ||
| <param argument="--pvalue" type="float" min="0.0" max="1.0" value="0.05" label="P-value threshold"/> | ||
| </inputs> | ||
| <outputs> | ||
| <data name="correlations" format="tabular" from_work_dir="sparse_filtered_correlation.tsv" label="${tool.name} on ${on_string}: filtered_correlations.tsv"> | ||
| <actions> | ||
| <action type="metadata" name="column_names" default="column,row,value" /> | ||
| </actions> | ||
| </data> | ||
| <data name="pvalues" format="tabular" from_work_dir="sparse_filtered_pvalue.tsv" label="${tool.name} on ${on_string}: filtered_pvalues.tsv"> | ||
| <actions> | ||
| <action type="metadata" name="column_names" default="column,row,value" /> | ||
| </actions> | ||
| </data> | ||
| </outputs> | ||
| <tests> | ||
| <test expect_num_outputs="2"> | ||
| <param name="correlation_table" ftype="tabular" value="fake_data_cor.tsv"/> | ||
| <param name="pvalue_table" ftype="tabular" value="pvalues.tsv"/> | ||
| <output name="correlations" file="filtered_correlations.tsv" compare="diff"/> | ||
| <output name="pvalues" file="filtered_pvalues.tsv" compare="diff"/> | ||
| </test> | ||
| </tests> | ||
| <help><![CDATA[ | ||
| What it does | ||
| ============ | ||
|
|
||
| This tool filters pairwise correlations and p-values from FastSpar outputs to generate sparse matrices suitable for network construction or visualization. It is typically used as the final step in a FastSpar pipeline. | ||
|
|
||
| Filtering Criteria | ||
| ================== | ||
|
|
||
| - **Absolute correlation threshold**: Only retain OTU pairs whose absolute correlation exceeds this value. | ||
| - **P-value threshold**: Only retain OTU pairs whose empirical p-value is below this cutoff. | ||
|
|
||
| Both conditions must be satisfied (logical AND). | ||
|
|
||
| Required Inputs | ||
| =============== | ||
|
|
||
| - **Correlation table**: A symmetric matrix from FastSpar. | ||
| - **P-value table**: A matching symmetric matrix from FastSpar p-value estimation. | ||
|
|
||
| Generated Outputs | ||
| ================= | ||
|
|
||
| - `filtered_correlations.tsv`: Correlation values that passed both thresholds. | ||
| - `filtered_pvalues.tsv`: Matching p-values for retained entries. | ||
|
|
||
| Notes | ||
| ===== | ||
|
|
||
| - Both input matrices must have identical dimensions and OTU order. | ||
| - The output tables are still symmetric and retain all diagonal values (e.g., self-correlations). | ||
|
|
||
| Additional Resources | ||
| ==================== | ||
|
|
||
| - FastSpar GitHub: https://github.com/scwatts/fastspar | ||
| ]]></help> | ||
| <expand macro="citations"/> | ||
| </tool> |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add a comment
##what this param is doing?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter is necessary to skip a continue prompt when an OTU has only one permutation (all samples have identical counts).
The wrapper should probably pass the warning message to the user (beyond the error hidden in the job information panel), but I'm not sure if the problem is severe enough to fail the tool execution.