Releases · edgardomortiz/Captus

27 Feb 20:13

edgardomortiz

v1.6.5

2c5956a

Captus v1.6.5 Latest

Latest

Changes to captus extract, all related to clustering:

The clustering workflow logic has been improved. For example, if the clustering input file is found it can be reused if the clustering parameters change in a new run. Extraction of clusters is performed if a valid target file is found in the clustering directory instead of just skipping it if clustering was previously run.
A new option --exlcude_samples has been added, these samples are not used for clustering, but the targets derived from the clustering of the rest of samples are still extracted from the excluded samples.
Changed easy-cluster as the default algorithm for MMSeqs when clustering with captus extract

Assets 2

23 Feb 23:13

edgardomortiz

v1.6.4

c4de8d4

Captus v1.6.4

Instead of removing sequences longer than --cl_max_seq_len in captus extract or --max_seq_len in captusd cluster before clustering now they are shredded into fragments that are at most max_seq_len long. This will allow to cluster very long sequences, like those found in chromosome-level assemblies.

Assets 2

22 Feb 03:04

edgardomortiz

v1.6.3

fcdd51b

Captus v1.6.3

The value of WSCORE_EXP (see previous release https://github.com/edgardomortiz/Captus/releases/tag/v1.6.2) was too aggressive, the new default is 2 and behaves as expected.

Assets 2

21 Feb 19:48

edgardomortiz

v1.6.2

1891e1f

Captus v1.6.2

Changes to captus extract:

The weighted score wscore formula has been changed. The previous formula was simply the multiplication of score x coverage but it tended to favor coverage over identity as coverage increased. The new wscore is identity^N x coverage^(1/N) to progressively favor identity as it increases while progressively decreasing the influence of coverage as coverage increases. N can be changed in settings.py as WSCORE_EXP. The new wscore changes paralog ranking when many copies are recovered per locus (e.g. in polyploids). Thanks for the discussion and suggestions to Katharina Rambau (NTNU) and Elizabeth White (FLMNH).
--paralog_tolerance has been replaced by --paralog_identity_tolerance, --paralog_coverage_tolerance, and --paralog_depth_tolerance for more control over paralog rejection. For example, if one wants to only retain paralogs that are at least 50% as long as the best hit for a locus one can use --paralog_coverage_tolerance 0.5.
To match the logic of the new paralog tolerance parameters, the parameters --nuc_depth_tolerance, --ptd_depth_tolerance, --mit_depth_tolerance, and --dna_depth_tolerance have been changed to use a proportion too. For example, if one wants to only use contigs for a nuclear locus that are at most half and order of magnitude as deep as the deepest contig in the locus one can use --nuc_depth_tolerance 0.5 instead of 2 as before.
[depth=XX.XX] has been added to extracted sequences descriptions and final GFFs when the original assembly contigs contain _cov_ in their headers.

Changes to captus align:

New MAFFT options are available: --mafft_unalignlevel and --mafft_leavegappyregion. If --mafft_unalignlevel is greater than 0 (recommended value 0.8) unrelated sequence segments are not aligned to exclude potentially contaminated sequence segments, now the trimming stage can remove those segments more efficiently. Consider using together with --mafft_leavegappyregion so gappy regions are not exhaustively aligned and therefore better detected and removed by the trimming stage. The method is described in https://doi.org/10.1093/bioinformatics/btw108. Thanks to Thomas Kiebacher (Naturkundemuseum Stuttgart) for the suggestion.
The untrimmed alignments that include the reference target sequences are kept by default, in case paralog filtration has to be repeated.

Changes to captusd cluster:

The option --align_singletons has been changed to --align_min_species to control the minimum species required in order to align a cluster.

Changes to phylo_commands:

Commands for IQ-TREE now include the option -czb by default, this collapses near zero length branches so the resolution of those clades is now handled by ASTRAL, use --force_bifurcating to disable it.
Ultrafast bootstraps are not calculated by default because they are usually not required for individual gene trees.

Assets 2

07 Nov 23:27

edgardomortiz

v1.6.1

9412e05

Captus v1.6.1

Compatibility with Python 3.14
Cosmetic improvements to the GFF annotation files produced by captus extract
Added option max_locus_overlap to captus extract to modify the allowed percentage of overlap between annotations of different loci belonging to the same marker type
Added pairs of plant mitochondrial genes allowed to overlap to settings.py (thanks to Jose David Cruz Plancarte):

    ["rpl16", "rps3"],  # related to https://github.com/edgardomortiz/Captus/issues/33
    ["cox3", "sdh4"],  # related to https://github.com/edgardomortiz/Captus/issues/33

Assets 2

29 Sep 06:13

edgardomortiz

v1.6.0

c3b74f2

Captus v1.6.0

Fixed potential bug in captus extract when contigs have depth 0 (pattern _cov_0.0000_ in the contig name).
Cosmetic improvements to new_targets_from_alignments.

Assets 2

12 Sep 20:39

edgardomortiz

v1.5.9

193f938

Captus v1.5.9

Changes to captus extract:

The option --predict will also fill gaps that can be translated during extraction, this is useful when you are searching/extracting proteins from transcripts or CDS collections or RNA reads assembled with Captus, allowing to add intervening aminoacids found in the sample but not present in the target protein.

Changes to captusd cluster:

When clustering CDS, Captus can now translate them on the fly and cluster them as protein sequences to find homologs (new options --translate_cds and --transtable). This allows to recover larger clusters of orthologs across greater evolutionary distances than just clustering the CDS as nucleotides. The baits are still created from the CDS in nucleotides. WARNING: The sequences will be directly translated using reading frame 1, so please verify they can be translated before using this option.
Added options to control minimum sequence coverage during deduplication (--dedup_coverage) and during clustering (--clust_coverage).

Changes to new_targets_from_alignments:

Paralogs can be more confidently flagged when unfiltered alignments are provided. This can aid in separating in paralogs as separate loci for future extractions.

Assets 2

08 Sep 04:29

edgardomortiz

v1.5.8

3ad41dc

Captus v1.5.8

Changes in captus extract:

New option called --cl_rep_single allows the creation of a target file from contig clustering that contains a single sequence representative per cluster. This is useful if one plans to map back the reads to the target file.

Changes in extra scripts:

most_common_target_per_locus is now faster and can also process Captus extractions directories or Captus alignments directories. A number of min_samples can be chosen.
concatenate_alignments and phylo_commands now can process a file with a list of paths to the alignments instead of only allowing to search the Captus alignments directory.
Added wrappers for all extra scripts so they can be ran even if Captus is not installed (ending in -runner.py)