Skip to content

Releases: jolespin/veba

VEBA_v2.5.1

12 Apr 18:47
c146e89

Choose a tag to compare

[2.5.1] - 2025.04.12

Added

  • Added install-gpu.sh which installs GPU accelerated environments when applicable (i.e., VEBA-binning-prokaryotic_env and VEBA-binning-viral_env)
  • Added Dockerfile-GPU which is experimental

Changed

  • Changed install.sh so it only installs CPU-based environments Issue #167
  • Changed containerize_environments.sh so it only installs CPU-based environments Issue #167

Deprecated

  • Deprecated VirFinder algorithm in binning-viral.py so now only geNomad is supported

VEBA_v2.5.0

10 Apr 20:40
4ac6dab

Choose a tag to compare

[2.5.0] - 2025.04.10

Added

  • Added VAMB support to binning-prokaryotic.py (now a default binner) and binning_wrapper.py.
  • Added automatic gzipping of output files based on .gz extension in edgelist_to_clusters.py using pyexeggutor.open_file_writer.
  • Added xxhash dependency to VEBA-binning-prokaryotic_env for bin name reproducibility (Issue #140).
  • Added -e/--exclude and -d/--domain_predictions options to filter_binette_results.py for removing eukaryotic genomes and setting up domain assignments (Issue #153).
  • Added semibin2-[biome] option to binning-prokaryotic.py allowing specification of multiple biomes (e.g., semibin2-global, semibin2-ocean), replacing --semibin2_biome (Issue #155).
  • Added --semibin2_orf_finder option to binning_wrapper.py.
  • Added genome_statistics.tsv.gz, gene_statistics.cds.tsv.gz, gene_statistics.rRNA.tsv.gz, and gene_statistics.tRNA.tsv.gz outputs to essentials.py.
  • Added --identifiers, --index_name, and --no_header options to convert_metabat2_coverage.py for broader applicability, including VAMB.
  • Added -l eukaryota_odb12 as default but also allow --auto-lineage-euk for BUSCO in binning-eukaryotic.py

Changed

  • Changed binning-eukaryotic.py behavior to provide a solution to BUSCO Issue #447
  • Changed CHANGELOG.md format to best practice Keep a Changelog
  • Changed prodigal-gv to pyrodigal-gv in multithreaded mode for binning-viral.py for performance.
  • Removed metacoag from the default set of binning algorithms in binning-prokaryotic.py.
  • Updated geNomad to v1.11.0 and geNomad database to v1.8 to resolve numpy import errors (Issue #160).
  • Updated Pyrodigal usage in binning-eukaryotic.py for organelles to allow piping and threading.
  • Updated BUSCO to v5.8.3 and associated databases.
  • Updated Tiara to Tiara-NAL in VEBA-binning-prokaryotic_env and VEBA-binning-eukaryotic_env to enable stdin usage.
  • Updated biosynthetic.py to use antiSMASH v7 (Issue #159).
  • Changed behavior when --taxon fungi is specified: precomputed genes are not used due to formatting issues.
  • Simplified the method for adding headers to Diamond outputs in biosynthetic.py.
  • Changed Dockerfile working directory from /tmp/ to /home/.
  • Integrated Tiara and consensus_domain_classification.py into the binette step of binning-prokaryotic.py.
  • Renamed database identifier from VDB to VEBA-DB.
  • Updated CheckM2 and Binette versions in binning-prokaryotic.py.
  • Updated CheckM2 Diamond database included in VEBA-DB_v9 (Issue #154).
  • Removed usage of precomputed genes in the SemiBin2 wrapper due to SemiBin2/issue-#185.
  • Allowed faulty return codes in iterative mode for binette to permit convergence in genome recovery.

Fixed

  • Fixed CONDA_ENVS_PATH detection in the veba controller executable to correctly handle environments outside the base Conda directory.
  • Fixed bug where VFDB hits were incorrectly counted as MIBiG in biosynthetic.py (Issue #141).
  • Fixed --tta_threshold argument in biosynthetic.py which was previously defined but not connected to the command execution.
  • Removed capitalization from column headers in filter_binette_results.py output.
  • Fixed missing --antismash_options argument connection in biosynthetic.py.

Removed

  • Removed CONCOCT support from binning-eukaryotic.py.

Deprecated

  • Deprecated amplicon.py module in favor of external pipelines like nf-core/ampliseq.

VEBA_v2.4.2

02 Feb 00:01
4ac6dab

Choose a tag to compare

v2.4.2 fixed a small bug where de bruijn graph for MEGAHIT wasn't included in output directory if the graph was created
[2025.2.1] - Added --megahit_build_de_bruijn_graph to make de-Bruijn graph construction for MEGAHIT optional in assembly.py

VEBA_v2.4.1

01 Feb 23:47
f2aee20

Choose a tag to compare

  • [2025.2.1] - Added --megahit_build_de_bruijn_graph to make de-Bruijn graph construction for MEGAHIT optional in assembly.py

VEBA_v2.4.0

29 Jan 20:28
45ed244

Choose a tag to compare

  • [2025.1.24] - Added Initial_bins to Binette results in filter_binette_results.py
  • [2025.1.23] - Added essentials.py module
  • [2025.1.16] - Added --serialized_annotations to append_annotations_to_gff.py to avoid overhead from reparsing the annotations
  • [2025.1.15] - Fixed bug in binning_wrapper.py where script was looking for bins in the wrong directory for MetaCoAG
  • [2025.1.14] - Fixed bug in merge_annotations.py where diamond outputs were queried incorrectly
  • [2025.1.5] - Change default --busco_completeness from 50 to 30 in binning-eukaryotic.py
  • [2025.1.5] - Added --busco_options and --busco_offline arguments for binning-eukaryotic.py
  • [2024.12.28] - Added --semibin2_sequencing_type to binning_wrapper.py and added functionality for --long_reads. Moved --long_reads argument to parser_io instead of parser_featurecounts
  • [2024.12.27] - Fixed issue in consensus_domain_classification.py where softmax returns a np.array instead of a pd.DataFrame
  • [2024.12.26] - Added support for precomputed coverage for metadecoder in binning_wrapper.py
  • [2024.12.26] - Added support for binette and tiara in updated binning_prokaryotic.py module
  • [2024.12.23] - Added copy_attribute_in_gff.py script which copies attributes to a source and destination attribute
  • [2024.12.17] - Added filter_binette_results.py script
  • [2024.12.16] - Added intermediate directory to metacoag in binning_wrapper.py
  • [2024.12.12] - Added metacoag support and custom HMM support to metadecoder in binning_wrapper.py
  • [2024.12.11] - Added prepend_de-bruijn_path.py script and use this in assembly.py and assembly-long.py to prepend prefix to SPAdes/Flye de Bruijn graph paths.
  • [2024.12.10] - Changed default --minimum_genome_size to 200000 from 150000
  • [2024.12.9] - Added support for SemiBin2 and MetaDecoder in binning_wrapper.py
  • [2024.11.21] - Updated --cluster_label_mode default to md5 instead of numeric to allow for easier cluster updates post hoc. Change reflected in cluster.py, global_clustering.py, local_clustering.py, and update_genome_clusters.py
  • [2024.11.18] - Added update_genome_clusters.py which runs skani against all reference genome clusters. Does not do protein clustering nor does it update the graph, representatives, or proteins.
  • [2024.11.15] - Added --header simple to diamond output in annotate.py and accounted for change in merge_annotations.py
  • [2024.11.11] - Added Enzymes to append_annotations_to_gff.py script
  • [2024.11.9] - Added kofam.enzymes.list and kofam.pathways.list in VDB_v8.1 to provide subsets for pykofamsearch
  • [2024.11.8] - Updating VEBA database VDB_v8 to VDB_v8.1 which adds serialized KOfam with enzyme support
  • [2024.11.8] - Added Enzymes to annotate.py and merge_annotations.py [!untested]
  • [2024.11.7] - Updated pyhmmsearch and pykofamsearch version in VEBA-annotate_env.yml, VEBA-classify-eukaryotic_env.yml,VEBA-database_env, and VEBA-phylogeny_env. Also updated executables in annotate.py, classify-eukaryotic.py, phylogeny.py, and download_databases-annotate.sh.
  • [2024.11.7] - In edgelist_to_clusters.py, added --cluster_label_mode {"numeric", "random", "pseudo-random", "md5", "nodes"} to allow for different types of labels. Added --threshold2 option for a second weight.
  • [2024.11.7] - Added --wrap to fasta_utility.py and split id and descriptions in header so prefix/suffix is only added to id.
  • [2024.11.7] - Added prepend_gff.py to prepend a prefix to contig and attribute identifiers
  • [2024.11.7] - Changed default --skani_minimum_af to 50 from 15 as this is used in GTDB-Tk for determining species-level clusters in cluster.py, global_clustering.py, and local_clustering.py
  • [2024.11.6] - Added append_annotations_to_gff.py script
  • [2024.10.29] - Changed manual mode to metaeuk mode for preexisting metaeuk results

VEBA_v2.3.0

22 Sep 16:52
2848a37

Choose a tag to compare

  • [2024.9.21] - Added KEGG Pathway Profiler to VEBA-database_env and VEBA-annotate_env which replaces MicrobeAnnotator-KEGG for module completion ratios. Replacing ${VEBA_DATABASE}/Annotate/MicrobeAnnotator-KEGG with ${VEBA_DATABASE}/Annotate/KEGG-Pathway-Profiler/ database files. Note: New module completion ratio output does not have classes labels for KEGG modules.
  • [2024.8.30] - Added ${N_JOBS} to download scripts with default set to maximum threads available

VEBA_v2.2.1

30 Aug 00:38
2a504ae

Choose a tag to compare

  • [2024.8.29] - Added VERSION file created in download_databases.sh
  • [2024.7.11] - Alignment fraction threshold for genome clustering only applied to reference but should also apply to query. Added --af_mode with either relaxed = max([Alignment_fraction_ref, Alignment_fraction_query]) > minimum_af or strict = (Alignment_fraction_ref > minimum_af) & (Alignment_fraction_query > minimum_af) to edgelist_to_clusters.py, global_clustering.py, local_clustering.py, and cluster.py.
  • [2024.7.3] - Added pigz to VEBA-annotate_env which isn't a problem with most conda installations but needed for docker containers.
  • [2024.6.21] - Changed choose_fastest_mirror.py to determine_fastest_mirror.py
  • [2024.6.20] - Added -m/--include_mrna to compile_metaeuk_identifiers.py for Issue #110

VEBA_v2.2.0

10 Jun 01:16
05af0fd

Choose a tag to compare

Disclaimer:
I made some large updates in this version and I believe everything has been adequately tested but just in case anything has slipped between the cracks you can use v2.1.0 which has been thoroughly tested in accordance to the NAR Espinoza 2024 paper. Benefits of using this version include much faster and robust prokaryotic classifications and fast/scalable HMM-based annotation modeling.

Large performance updates for this version including:

  • Updating GTDB-Tk 2.3.0 -> 2.4.0 which means the GTDB needed to be updated from r214.1 -> r220
  • VEBA-classify_env was split up into VEBA-classify-eukaryotic_env, VEBA-classify-prokaryotic_env, and VEBA-prokaryotic_env
  • annotate.py, classify-eukaryotic.py, and phylogeny.py were rewritten (and their utility scripts) were updated to used PyHMMER (pyhmmsearch and pykofamsearch) which is faster than HMMSearch when multithreaded.
  • KOFAM was changed to KOfam

NOTE: Please don't use the tar.gz as it contains the 2.1.0 version for some reason:

VERSION="2.2.0"
# wget https://github.com/jolespin/veba/archive/refs/tags/v${VERSION}.tar.gz # The .tar.gz is out of date in this release
# tar -xvf v${VERSION}.tar.gz && mv veba-${VERSION} veba

# Alternative download
wget https://github.com/jolespin/veba/releases/download/v${VERSION}/v${VERSION}.zip
unzip -d veba v${VERSION}.zip

VEBA_v2.1.0-zen

03 Jun 18:46
05af0fd

Choose a tag to compare

This is the exact same version as VEBA_v2.1.0. New VEBA releases will now automatically be synced to Zenodo.

VEBA_v2.1.0

17 May 14:13
b67f0ed

Choose a tag to compare

Official release of VEBA v2.1.0 with updates to address peer reviewers. Mostly documentation but also including the following:

  • [2024.4.30] - Added concatenate_files.py which can concatenate files (and mixed compressed/decompressed files) using either arguments, list file, or glob. Reason for this is that unix has a limit of arguments that can be used (e.g., cat *.fasta > output.fasta where *.fasta results in 50k files will crash)
  • [2024.4.29] - Added /volumes/workspace/ directory to Docker containers for situations when your input and output directories are the same.
  • [2024.4.29] - featureCounts can only handle 64 threads at a time so added min(64, opts.n_jobs) for all the modules/scripts that use featureCounts commands.
  • [2024.4.23] - Added uniprot_to_enzymes.py which reformats tables and fasta from https://www.uniprot.org/uniprotkb?query=ec%3A*
  • [2024.4.18] - Developed a faster CLI implementation of KofamScan called PyKofamSearch which leverage PyHmmer. This will be used in future versions of VEBA.
  • [2024.4.18] - Developed a faster CLI implementation of HMMSearch called PyHMMSearch which leverage PyHmmer. This will be used in future versions of VEBA.
  • [2024.3.26] - Added --metaeuk_split_memory_limit to metaeuk_wrapper.py.
  • [2024.3.26] - Added -d/--genome_identifier_directory_index to scaffolds_to_bins.py for directories that are structured path/to/genomes/bin_a/reference.fasta where you would use -d -2.
  • [2024.3.26] - Added --minimum_af to edgelist_to_clusters.py with an option to accept 4 column inputs [id_1]<tab>[id_2]<tab>[weight]<tab>[alignment_fraction]. global_clustering.py, local_clustering.py, and cluster.py now use this by default --af_threshold 30.0. If you want to retain previous behavior, just use --af_threshold 0.0.
  • [2024.3.18] - edgelist_to_clusters.py only includes edges where both nodes are in identifiers set. If --identifiers are provided, then only those identifiers are used. If not, then it includes all nodes.
  • [2024.3.18] - Added --export_representatives argument for edgelist_to_clusters.py to output table with [id_node]<tab>[id_cluster]<tab>[intra-cluster_connectivity]<tab>[representative]. Also includes this information in nx.Graph objects.
  • [2024.3.18] - Changed singleton weight to np.nan instead of np.inf for edgelist_to_clusters.py to allow for representative calculations.
  • YouTube channel (https://www.youtube.com/@VEBA-Multiomics)