Releases: edgardomortiz/Captus
Releases · edgardomortiz/Captus
Captus v1.6.5
Changes to captus extract, all related to clustering:
- The clustering workflow logic has been improved. For example, if the clustering input file is found it can be reused if the clustering parameters change in a new run. Extraction of clusters is performed if a valid target file is found in the clustering directory instead of just skipping it if clustering was previously run.
- A new option
--exlcude_sampleshas been added, these samples are not used for clustering, but the targets derived from the clustering of the rest of samples are still extracted from the excluded samples. - Changed
easy-clusteras the default algorithm for MMSeqs when clustering withcaptus extract
Captus v1.6.4
Instead of removing sequences longer than --cl_max_seq_len in captus extract or --max_seq_len in captusd cluster before clustering now they are shredded into fragments that are at most max_seq_len long. This will allow to cluster very long sequences, like those found in chromosome-level assemblies.
Captus v1.6.3
- The value of
WSCORE_EXP(see previous release https://github.com/edgardomortiz/Captus/releases/tag/v1.6.2) was too aggressive, the new default is 2 and behaves as expected.
Captus v1.6.2
Changes to captus extract:
- The weighted score
wscoreformula has been changed. The previous formula was simply the multiplication ofscore x coveragebut it tended to favor coverage over identity as coverage increased. The newwscoreisidentity^N x coverage^(1/N)to progressively favor identity as it increases while progressively decreasing the influence of coverage as coverage increases. N can be changed insettings.pyasWSCORE_EXP. The newwscorechanges paralog ranking when many copies are recovered per locus (e.g. in polyploids). Thanks for the discussion and suggestions to Katharina Rambau (NTNU) and Elizabeth White (FLMNH). --paralog_tolerancehas been replaced by--paralog_identity_tolerance,--paralog_coverage_tolerance, and--paralog_depth_tolerancefor more control over paralog rejection. For example, if one wants to only retain paralogs that are at least 50% as long as the best hit for a locus one can use--paralog_coverage_tolerance 0.5.- To match the logic of the new paralog tolerance parameters, the parameters
--nuc_depth_tolerance,--ptd_depth_tolerance,--mit_depth_tolerance, and--dna_depth_tolerancehave been changed to use a proportion too. For example, if one wants to only use contigs for a nuclear locus that are at most half and order of magnitude as deep as the deepest contig in the locus one can use--nuc_depth_tolerance 0.5instead of2as before. [depth=XX.XX]has been added to extracted sequences descriptions and final GFFs when the original assembly contigs contain_cov_in their headers.
Changes to captus align:
- New MAFFT options are available:
--mafft_unalignleveland--mafft_leavegappyregion. If--mafft_unalignlevelis greater than 0 (recommended value 0.8) unrelated sequence segments are not aligned to exclude potentially contaminated sequence segments, now the trimming stage can remove those segments more efficiently. Consider using together with--mafft_leavegappyregionso gappy regions are not exhaustively aligned and therefore better detected and removed by the trimming stage. The method is described in https://doi.org/10.1093/bioinformatics/btw108. Thanks to Thomas Kiebacher (Naturkundemuseum Stuttgart) for the suggestion. - The untrimmed alignments that include the reference target sequences are kept by default, in case paralog filtration has to be repeated.
Changes to captusd cluster:
- The option
--align_singletonshas been changed to--align_min_speciesto control the minimum species required in order to align a cluster.
Changes to phylo_commands:
- Commands for IQ-TREE now include the option
-czbby default, this collapses near zero length branches so the resolution of those clades is now handled by ASTRAL, use--force_bifurcatingto disable it. - Ultrafast bootstraps are not calculated by default because they are usually not required for individual gene trees.
Captus v1.6.1
- Compatibility with Python 3.14
- Cosmetic improvements to the GFF annotation files produced by
captus extract - Added option
max_locus_overlaptocaptus extractto modify the allowed percentage of overlap between annotations of different loci belonging to the same marker type - Added pairs of plant mitochondrial genes allowed to overlap to
settings.py(thanks to Jose David Cruz Plancarte):
["rpl16", "rps3"], # related to https://github.com/edgardomortiz/Captus/issues/33
["cox3", "sdh4"], # related to https://github.com/edgardomortiz/Captus/issues/33
Captus v1.6.0
- Fixed potential bug in
captus extractwhen contigs have depth 0 (pattern_cov_0.0000_in the contig name). - Cosmetic improvements to
new_targets_from_alignments.
Captus v1.5.9
Changes to captus extract:
- The option
--predictwill also fill gaps that can be translated during extraction, this is useful when you are searching/extracting proteins from transcripts or CDS collections or RNA reads assembled with Captus, allowing to add intervening aminoacids found in the sample but not present in the target protein.
Changes to captusd cluster:
- When clustering CDS, Captus can now translate them on the fly and cluster them as protein sequences to find homologs (new options
--translate_cdsand--transtable). This allows to recover larger clusters of orthologs across greater evolutionary distances than just clustering the CDS as nucleotides. The baits are still created from the CDS in nucleotides. WARNING: The sequences will be directly translated using reading frame 1, so please verify they can be translated before using this option. - Added options to control minimum sequence coverage during deduplication (
--dedup_coverage) and during clustering (--clust_coverage).
Changes to new_targets_from_alignments:
- Paralogs can be more confidently flagged when
unfilteredalignments are provided. This can aid in separating in paralogs as separate loci for future extractions.
Captus v1.5.8
Changes in captus extract:
- New option called
--cl_rep_singleallows the creation of a target file from contig clustering that contains a single sequence representative per cluster. This is useful if one plans to map back the reads to the target file.
Changes in extra scripts:
most_common_target_per_locusis now faster and can also process Captus extractions directories or Captus alignments directories. A number ofmin_samplescan be chosen.concatenate_alignmentsandphylo_commandsnow can process a file with a list of paths to the alignments instead of only allowing to search the Captus alignments directory.- Added wrappers for all extra scripts so they can be ran even if Captus is not installed (ending in
-runner.py)
Captus v1.5.7
captus_extract: Fixed calculation of fragmentation statistics on circular contigs
Captus v1.5.6
- Fixed bug introduced when selecting
--tolerance -1incaptus align