Skip to content

Captus v1.6.2

Choose a tag to compare

@edgardomortiz edgardomortiz released this 21 Feb 19:48
· 10 commits to master since this release

Changes to captus extract:

  • The weighted score wscore formula has been changed. The previous formula was simply the multiplication of score x coverage but it tended to favor coverage over identity as coverage increased. The new wscore is identity^N x coverage^(1/N) to progressively favor identity as it increases while progressively decreasing the influence of coverage as coverage increases. N can be changed in settings.py as WSCORE_EXP. The new wscore changes paralog ranking when many copies are recovered per locus (e.g. in polyploids). Thanks for the discussion and suggestions to Katharina Rambau (NTNU) and Elizabeth White (FLMNH).
  • --paralog_tolerance has been replaced by --paralog_identity_tolerance, --paralog_coverage_tolerance, and --paralog_depth_tolerance for more control over paralog rejection. For example, if one wants to only retain paralogs that are at least 50% as long as the best hit for a locus one can use --paralog_coverage_tolerance 0.5.
  • To match the logic of the new paralog tolerance parameters, the parameters --nuc_depth_tolerance, --ptd_depth_tolerance, --mit_depth_tolerance, and --dna_depth_tolerance have been changed to use a proportion too. For example, if one wants to only use contigs for a nuclear locus that are at most half and order of magnitude as deep as the deepest contig in the locus one can use --nuc_depth_tolerance 0.5 instead of 2 as before.
  • [depth=XX.XX] has been added to extracted sequences descriptions and final GFFs when the original assembly contigs contain _cov_ in their headers.

Changes to captus align:

  • New MAFFT options are available: --mafft_unalignlevel and --mafft_leavegappyregion. If --mafft_unalignlevel is greater than 0 (recommended value 0.8) unrelated sequence segments are not aligned to exclude potentially contaminated sequence segments, now the trimming stage can remove those segments more efficiently. Consider using together with --mafft_leavegappyregion so gappy regions are not exhaustively aligned and therefore better detected and removed by the trimming stage. The method is described in https://doi.org/10.1093/bioinformatics/btw108. Thanks to Thomas Kiebacher (Naturkundemuseum Stuttgart) for the suggestion.
  • The untrimmed alignments that include the reference target sequences are kept by default, in case paralog filtration has to be repeated.

Changes to captusd cluster:

  • The option --align_singletons has been changed to --align_min_species to control the minimum species required in order to align a cluster.

Changes to phylo_commands:

  • Commands for IQ-TREE now include the option -czb by default, this collapses near zero length branches so the resolution of those clades is now handled by ASTRAL, use --force_bifurcating to disable it.
  • Ultrafast bootstraps are not calculated by default because they are usually not required for individual gene trees.