Captus v1.6.2
Changes to captus extract:
- The weighted score
wscoreformula has been changed. The previous formula was simply the multiplication ofscore x coveragebut it tended to favor coverage over identity as coverage increased. The newwscoreisidentity^N x coverage^(1/N)to progressively favor identity as it increases while progressively decreasing the influence of coverage as coverage increases. N can be changed insettings.pyasWSCORE_EXP. The newwscorechanges paralog ranking when many copies are recovered per locus (e.g. in polyploids). Thanks for the discussion and suggestions to Katharina Rambau (NTNU) and Elizabeth White (FLMNH). --paralog_tolerancehas been replaced by--paralog_identity_tolerance,--paralog_coverage_tolerance, and--paralog_depth_tolerancefor more control over paralog rejection. For example, if one wants to only retain paralogs that are at least 50% as long as the best hit for a locus one can use--paralog_coverage_tolerance 0.5.- To match the logic of the new paralog tolerance parameters, the parameters
--nuc_depth_tolerance,--ptd_depth_tolerance,--mit_depth_tolerance, and--dna_depth_tolerancehave been changed to use a proportion too. For example, if one wants to only use contigs for a nuclear locus that are at most half and order of magnitude as deep as the deepest contig in the locus one can use--nuc_depth_tolerance 0.5instead of2as before. [depth=XX.XX]has been added to extracted sequences descriptions and final GFFs when the original assembly contigs contain_cov_in their headers.
Changes to captus align:
- New MAFFT options are available:
--mafft_unalignleveland--mafft_leavegappyregion. If--mafft_unalignlevelis greater than 0 (recommended value 0.8) unrelated sequence segments are not aligned to exclude potentially contaminated sequence segments, now the trimming stage can remove those segments more efficiently. Consider using together with--mafft_leavegappyregionso gappy regions are not exhaustively aligned and therefore better detected and removed by the trimming stage. The method is described in https://doi.org/10.1093/bioinformatics/btw108. Thanks to Thomas Kiebacher (Naturkundemuseum Stuttgart) for the suggestion. - The untrimmed alignments that include the reference target sequences are kept by default, in case paralog filtration has to be repeated.
Changes to captusd cluster:
- The option
--align_singletonshas been changed to--align_min_speciesto control the minimum species required in order to align a cluster.
Changes to phylo_commands:
- Commands for IQ-TREE now include the option
-czbby default, this collapses near zero length branches so the resolution of those clades is now handled by ASTRAL, use--force_bifurcatingto disable it. - Ultrafast bootstraps are not calculated by default because they are usually not required for individual gene trees.