-
Notifications
You must be signed in to change notification settings - Fork 3
Expand file tree
/
Copy pathmag_help.txt
More file actions
259 lines (226 loc) · 20.9 KB
/
mag_help.txt
File metadata and controls
259 lines (226 loc) · 20.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
$ NXF_SYNTAX_PARSER=v2 nextflow run . --help
N E X T F L O W ~ version 26.02.0-edge
Launching `./main.nf` [awesome_ride] revision: 1762f92e8d
WARN: Unrecognized config option 'validation.defaultIgnoreParams'
WARN: Unrecognized config option 'validation.monochromeLogs'
------------------------------------------------------
,--./,-.
___ __ __ __ ___ /,-._.--~'
|\ | |__ __ / ` / \ |__) |__ } {
| \| | \__, \__/ | \ |___ \`-._,-`-,
`._,._,'
nf-core/mag 5.5.0dev
------------------------------------------------------
Typical pipeline command:
nextflow run nf-core/mag -profile <docker/singularity/.../institute> --input samplesheet.csv --outdir <OUTDIR>
help message of that parameter will be printed.
or `--helpFull`.
Input/output options
--input [string] CSV samplesheet file containing information about the samples in the experiment.
--single_end [boolean] Specifies that the input is single-end reads.
--assembly_input [string] Additional input CSV samplesheet containing information about pre-computed assemblies. When set,
both read pre-processing and assembly are skipped and the pipeline begins at the binning stage.
--outdir [string] The output directory where the results will be saved. You have to use absolute paths to storage on
Cloud infrastructure.
--email [string] Email address for completion summary.
--multiqc_title [string] MultiQC report title. Printed as page header, used for filename if not otherwise specified.
Generic options
--multiqc_methods_description [string] Custom MultiQC yaml file containing HTML including a methods description.
--help [boolean, string] Display the help message.
--help_full [boolean] Display the full detailed help message.
--show_hidden [boolean] Display hidden parameters in the help message (only works when --help or --help_full are provided).
Reproducibility options
--megahit_fix_cpu_1 [boolean] Fix number of CPUs for MEGAHIT to 1. Not increased with retries.
--spades_fix_cpus [integer] Fix number of CPUs used by SPAdes. Not increased with retries. [default: -1]
--spadeshybrid_fix_cpus [integer] Fix number of CPUs used by SPAdes hybrid. Not increased with retries. [default: -1]
--metabat_rng_seed [integer] RNG seed for MetaBAT2. [default: 1]
Quality control for short reads options
--clip_tool [string] Specify which adapter clipping tool to use. (accepted: fastp, adapterremoval, trimmomatic)
[default: fastp]
--save_clipped_reads [boolean] Specify to save the resulting clipped FASTQ files to --outdir.
--reads_minlength [integer] The minimum length of reads must have to be retained for downstream analysis. [default: 15]
--fastp_qualified_quality [integer] Minimum phred quality value of a base to be qualified in fastp. [default: 15]
--fastp_cut_mean_quality [integer] The mean quality requirement used for per read sliding window cutting by fastp. [default: 15]
--fastp_save_trimmed_fail [boolean] Save reads that fail fastp filtering in a separate file. Not used downstream.
--fastp_trim_polyg [boolean] Turn on detecting and trimming of poly-G tails
--adapterremoval_minquality [integer] The minimum base quality for low-quality base trimming by AdapterRemoval. [default: 2]
--adapterremoval_trim_quality_stretch [boolean] Turn on quality trimming by consecutive stretch of low quality bases, rather than by window.
--adapterremoval_adapter1 [string] Forward read adapter to be trimmed by AdapterRemoval. [default:
AGATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG]
--adapterremoval_adapter2 [string] Reverse read adapter to be trimmed by AdapterRemoval for paired end data. [default:
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT]
--host_genome [string] Name of iGenomes reference for host contamination removal.
--host_fasta [string] Fasta reference file for host contamination removal.
--host_fasta_bowtie2index [string] Bowtie2 index directory corresponding to `--host_fasta` reference file for host contamination
removal.
--host_removal_verysensitive [boolean] Use the `--very-sensitive` instead of the`--sensitive`setting for Bowtie 2 to map reads against the
host genome.
--host_removal_save_ids [boolean] Save the read IDs of removed host reads.
--save_hostremoved_reads [boolean] Specify to save input FASTQ files with host reads removed to --outdir.
--keep_phix [boolean] Keep reads similar to the Illumina internal standard PhiX genome.
--phix_reference [string] Genome reference used to remove Illumina PhiX contaminant reads.
--skip_clipping [boolean] Skip read preprocessing using fastp or adapterremoval.
--skip_shortread_qc [boolean] Skip all default QC steps for short reads (adapter trimming, phiX removal).
--save_phixremoved_reads [boolean] Specify to save input FASTQ files with phiX reads removed to --outdir.
--bbnorm [boolean] Run BBnorm to normalize sequence depth.
--bbnorm_target [integer] Set BBnorm target maximum depth to this number. [default: 100]
--bbnorm_min [integer] Set BBnorm minimum depth to this number. [default: 5]
--save_bbnorm_reads [boolean] Save normalized read files to output directory.
Quality control for long reads options
--skip_adapter_trimming [boolean] Skip removing adapter sequences from long reads.
--skip_longread_filtering [boolean] Skip filtering long reads.
--skip_longread_qc [boolean] Skip all default QC steps for long reads (adapter trimming, filtering, removal of lambda sequences).
--longreads_min_length [integer] Discard any read which is shorter than this value. [default: 1000]
--longreads_min_quality [integer] Discard any read which has a mean quality score lower than this value.
--longreads_keep_percent [integer] Keep this percent of bases. [default: 90]
--longreads_length_weight [integer] The higher the more important is read length when choosing the best reads. [default: 10]
--keep_lambda [boolean] Keep reads similar to the ONT internal standard Escherichia virus Lambda genome.
--lambda_reference [string] Genome reference used to remove ONT Lambda contaminant reads.
--save_lambdaremoved_reads [boolean] Specify to save input FASTQ files with lamba reads removed to --outdir.
--save_porechop_reads [boolean] Specify to save the resulting clipped FASTQ files to --outdir.
--save_filtered_longreads [boolean] Specify to save the resulting length filtered long read FASTQ files to --outdir.
--longread_adaptertrimming_tool [string] Specify which long read adapter trimming tool to use. (accepted: porechop, porechop_abi)
[default: porechop_abi]
--longread_filtering_tool [string] Specify which long read filtering tool to use. (accepted: filtlong, nanoq, chopper) [default:
filtlong]
Taxonomic profiling options
--cat_db [string] Database for taxonomic classification of metagenome assembled genomes. Can be either a zipped file
or a directory containing the extracted output of such.
--cat_db_generate [boolean] Generate CAT database.
--save_cat_db [boolean] Save the CAT database generated when specified by `--cat_db_generate`.
--cat_allow_unofficial_lineages [boolean] Allow unofficial lineages in CAT classification.
--cat_classify_unbinned [boolean] Classify unbinned contigs with CAT (contig mode).
--cat_no_suggestive_asterisks [boolean] Specify to turn off CAT marking in output files most probable hits (when multiple) with an asterix.
--skip_gtdbtk [boolean] Skip the running of GTDB, as well as the automatic download of the database
--gtdb_db [string] Specify the location of a GTDBTK database. Can be either an uncompressed directory or a `.tar.gz`
archive. If not specified will be downloaded for you when GTDBTK or binning QC is not skipped.
[default:
https://data.gtdb.aau.ecogenomic.org/releases/release226/226.0/auxillary_files/gtdbtk_package/full_package/gtdbtk_r226_data.tar.gz]
--gtdbtk_min_completeness [number] Min. bin completeness (in %) required to apply GTDB-tk classification. [default: 50]
--gtdbtk_max_contamination [number] Max. bin contamination (in %) allowed to apply GTDB-tk classification. [default: 10]
--gtdbtk_min_perc_aa [number] Min. fraction of AA (in %) in the MSA for bins to be kept. [default: 10]
--gtdbtk_min_af [number] Min. alignment fraction to consider closest genome. [default: 0.65]
--gtdbtk_pplacer_cpus [integer] Number of CPUs used for the by GTDB-Tk run tool pplacer. [default: 1]
--gtdbtk_pplacer_useram [boolean] Speed up pplacer step of GTDB-Tk by loading to memory.
--gtdbtk_use_full_tree [boolean] Specify to have GTDBTk to use the full bacterial tree rather than the split tree (requires more
memory!)
--gtdbtk_skip_aniscreen [boolean] Specify to disable fast classification of genomes by ANI using skani in GTDB-Tk.
Assembly options
--coassemble_group [boolean] Co-assemble samples within one group, instead of assembling each sample separately.
--spades_options [string] Additional custom options for SPAdes and SPAdesHybrid. Do not specify `--meta` as this will be added
for you!
--spades_downstreaminput [string] Specify whether to use contigs or scaffolds assembled by SPAdes (accepted: scaffolds, contigs)
[default: scaffolds]
--megahit_options [string] Additional custom options for MEGAHIT.
--skip_spades [boolean] Skip Illumina-only SPAdes assembly.
--skip_spadeshybrid [boolean] Skip SPAdes hybrid assembly.
--skip_megahit [boolean] Skip MEGAHIT assembly.
--skip_ale [boolean] Skip ALE
--skip_quast [boolean] Skip metaQUAST.
--skip_metamdbg [boolean] Skip MetaDBG assembly.
--skip_flye [boolean] Skip Flye assembly.
Gene prediction and annotation options
--skip_prodigal [boolean] Skip Prodigal gene prediction
--prokka_with_compliance [boolean] Turn on Prokka complicance mode for truncating contig names for NCBI/ENA compatibility.
--prokka_compliance_centre [string] Specify sequencing centre name required for Prokka's compliance mode.
--skip_prokka [boolean] Skip Prokka genome annotation.
--prokka_fast_mode [boolean] Specify to skip CDS/product searching in Prokka runs
--skip_metaeuk [boolean] Skip MetaEuk gene prediction and annotation
--metaeuk_mmseqs_db [string] A string containing the name of one of the databases listed in the [mmseqs2
documentation](https://github.com/soedinglab/MMseqs2/wiki#downloading-databases). This database will
be downloaded and formatted for eukaryotic genome annotation. Incompatible with --metaeuk_db.
(accepted: UniRef100, UniRef90, UniRef50, UniProtKB, UniProtKB/TrEMBL, UniProtKB/Swiss-Prot, NR, NT,
GTDB, PDB, ...)
--metaeuk_db [string] Path to either a local fasta file of protein sequences, or to a directory containing an
MMseqs2-formatted database, for annotation of eukaryotic genomes.
--save_mmseqs_db [boolean] Save the downloaded mmseqs2 database specified in `--metaeuk_mmseqs_db`.
Virus identification options
--run_virus_identification [boolean] Run virus identification.
--genomad_db [string] Database for virus classification with geNomad
--genomad_min_score [number] Minimum geNomad score for a sequence to be considered viral [default: 0.7]
--genomad_splits [integer] Number of groups that geNomad's MMSeqs2 databse should be split into (reduced memory requirements)
[default: 1]
Binning options
--binning_map_mode [string] Defines mapping strategy to compute co-abundances for binning, i.e. which samples will be mapped
against the assembly. (accepted: all, group, own) [default: group]
--skip_binning [boolean] Skip metagenome binning entirely
--skip_metabat2 [boolean] Skip MetaBAT2 Binning
--skip_maxbin2 [boolean] Skip MaxBin2 Binning
--skip_concoct [boolean] Skip CONCOCT Binning
--skip_comebin [boolean] Skip COMEBin Binning
--skip_metabinner [boolean] Skip MetaBinner Binning
--bin_metabinner_scale [string] Dataset scale for MetaBinner (accepted: small, large, huge) [default: large]
--skip_semibin [boolean] Skip SemiBin2 Binning
--semibin_rng_seed [integer] RNG seed for SemiBin2. [default: 1]
--semibin_environment [string] Pre-trained model for SemiBin2 for single sample assemblies (accepted: human_gut, dog_gut,
ocean, soil, cat_gut, human_oral, mouse_gut, pig_gut, built_environment, ...) [default: global]
--min_contig_size [integer] Minimum contig size to be considered for binning and for bin quality check. [default: 1500]
--min_length_unbinned_contigs [integer] Minimal length of contigs that are not part of any bin but treated as individual genome.
[default: 1000000]
--max_unbinned_contigs [integer] Maximal number of contigs that are not part of any bin but treated as individual genome.
[default: 100]
--bin_min_size [integer] Specify the shortest length a bin should be to retain for downstream processing (in base pairs)
[default: 0]
--bin_max_size [integer] Specify the longest length a bin should be to retain for downstream processing (in base pairs). By
default no limit.
--bin_concoct_chunksize [integer] Specify length of sub-contigs cut up prior CONCOCT binning [default: 10000]
--bin_concoct_overlap [integer] Specify the overlap between each sub-contig prior CONCOCT binning [default: 0]
--bin_concoct_donotconcatlast [boolean] Specify to not append the last contig less than sub-contig length to the last correct length contig
--bowtie2_mode [string] Specify alternative Bowtie2 settings for aligning reads back against the assembly.
--save_assembly_mapped_reads [boolean] Save the output of mapping raw reads back to assembled contigs
--bin_domain_classification [boolean] Enable domain-level (prokaryote or eukaryote) classification of bins using Tiara. Processes which
are domain-specific will then only receive bins matching the domain requirement.
--tiara_min_length [integer] Minimum contig length for Tiara to use for domain classification. For accurate classification,
should be longer than 3000 bp. [default: 3000]
--exclude_unbins_from_postbinning [boolean] Exclude unbinned contigs in the post-binning steps (bin QC, taxonomic classification, and annotation
steps).
--longread_percentidentity [number] Specify a minimum percent identity filter for long reads mapping back to assembled contigs.
--shortread_percentidentity [number] Specify a minimum percent identity filter for short reads mapping back to assembled contigs.
Bin quality check options
--skip_binqc [boolean] Disable bin QC with BUSCO, CheckM or CheckM2.
--run_busco [boolean] Enable running BUSCO during bin QC.
--run_checkm [boolean] Enable running CheckM during bin QC.
--run_checkm2 [boolean] Enable running CheckM2 during bin QC.
--busco_db [string] Download URL, local tar.gz archive, or local uncompressed directory for an *_odb10 or *_odb12 BUSCO
lineage dataset.
--busco_db_lineage [string] Name of the BUSCO *_odb10 or *_odb12 lineage to check against. Additionally supports 'auto',
'auto_prok' and 'auto_euk' for automatic lineage selection mode. [default: auto]
--save_busco_db [boolean] Save the used BUSCO lineage datasets provided via `--busco_db`.
--busco_clean [boolean] Enable clean-up of temporary files created during BUSCO runs.
--checkm_db [string] Path to local folder containing already downloaded and uncompressed CheckM database.
--save_checkm_data [boolean] Save the used CheckM reference files downloaded when not using --checkm_db parameter.
--checkm2_db [string] Path to local file of an already downloaded and uncompressed CheckM2 database file (.dmnd file).
--checkm2_db_version [integer] CheckM2 database version number to download (Zenodo record ID, for reference check the canonical
reference https://zenodo.org/records/5571251, and pick the Zenodo ID of the database version of your
choice). [default: 14897628]
--save_checkm2_data [boolean] Save the used CheckM2 reference files downloaded when not using --checkm2_db parameter.
--refine_bins_dastool [boolean] Turn on bin refinement using DAS Tool.
--refine_bins_dastool_threshold [number] Specify single-copy gene score threshold for bin refinement. [default: 0.5]
--refine_bins_dastool_savecontig2bin [boolean] Specify to save contig to bin maps used for bin refinement
--postbinning_input [string] Specify which binning output is sent for downstream annotation, taxonomic classification, bin
quality control etc. (accepted: raw_bins_only, refined_bins_only, both) [default:
raw_bins_only]
--run_gunc [boolean] Turn on GUNC genome chimerism checks
--gunc_db [string] Specify a path to a pre-downloaded GUNC dmnd database file
--gunc_database_type [string] Specify which database to auto-download if not supplying own (accepted: progenomes, gtdb,
test_data) [default: progenomes]
--gunc_save_db [boolean] Save the used GUNC reference files downloaded when not using --gunc_db parameter.
--generate_bigmag_file [boolean] Make a BIgMAG input file including GUNC results.
Ancient DNA assembly
--ancient_dna [boolean] Turn on/off the ancient DNA subworkflow
--pydamage_accuracy [number] PyDamage accuracy threshold [default: 0.5]
--skip_ancient_damagecorrection [boolean] deactivate damage correction of ancient contigs using variant and consensus calling
--freebayes_ploidy [integer] Ploidy for variant calling [default: 1]
--freebayes_min_basequality [integer] minimum base quality required for variant calling [default: 20]
--freebayes_minallelefreq [number] minimum minor allele frequency for considering variants [default: 0.33]
--bcftools_view_high_variant_quality [integer] minimum genotype quality for considering a variant high quality [default: 30]
--bcftools_view_medium_variant_quality [integer] minimum genotype quality for considering a variant medium quality [default: 20]
--bcftools_view_minimal_allelesupport [integer] minimum number of bases supporting the alternative allele [default: 3]
!! Hiding 22 param(s), use the `--showHidden` parameter to show them !!
------------------------------------------------------
* The pipeline
https://doi.org/10.1093/nargab/lqac007
* The nf-core framework
https://doi.org/10.1038/s41587-020-0439-x
* Software dependencies
https://github.com/nf-core/mag/blob/main/CITATIONS.md