Releases: jolespin/veba
VEBA_v2.5.1
[2.5.1] - 2025.04.12
Added
- Added
install-gpu.shwhich installs GPU accelerated environments when applicable (i.e.,VEBA-binning-prokaryotic_envandVEBA-binning-viral_env) - Added
Dockerfile-GPUwhich is experimental
Changed
- Changed
install.shso it only installs CPU-based environments Issue #167 - Changed
containerize_environments.shso it only installs CPU-based environments Issue #167
Deprecated
- Deprecated
VirFinderalgorithm inbinning-viral.pyso now onlygeNomadis supported
VEBA_v2.5.0
[2.5.0] - 2025.04.10
Added
- Added
VAMBsupport tobinning-prokaryotic.py(now a default binner) andbinning_wrapper.py. - Added automatic gzipping of output files based on
.gzextension inedgelist_to_clusters.pyusingpyexeggutor.open_file_writer. - Added
xxhashdependency toVEBA-binning-prokaryotic_envfor bin name reproducibility (Issue #140). - Added
-e/--excludeand-d/--domain_predictionsoptions tofilter_binette_results.pyfor removing eukaryotic genomes and setting up domain assignments (Issue #153). - Added
semibin2-[biome]option tobinning-prokaryotic.pyallowing specification of multiple biomes (e.g.,semibin2-global,semibin2-ocean), replacing--semibin2_biome(Issue #155). - Added
--semibin2_orf_finderoption tobinning_wrapper.py. - Added
genome_statistics.tsv.gz,gene_statistics.cds.tsv.gz,gene_statistics.rRNA.tsv.gz, andgene_statistics.tRNA.tsv.gzoutputs toessentials.py. - Added
--identifiers,--index_name, and--no_headeroptions toconvert_metabat2_coverage.pyfor broader applicability, includingVAMB. - Added
-l eukaryota_odb12as default but also allow--auto-lineage-eukforBUSCOinbinning-eukaryotic.py
Changed
- Changed
binning-eukaryotic.pybehavior to provide a solution to BUSCO Issue #447 - Changed
CHANGELOG.mdformat to best practice Keep a Changelog - Changed
prodigal-gvtopyrodigal-gvin multithreaded mode forbinning-viral.pyfor performance. - Removed
metacoagfrom the default set of binning algorithms inbinning-prokaryotic.py. - Updated
geNomadtov1.11.0andgeNomad databasetov1.8to resolve numpy import errors (Issue #160). - Updated
Pyrodigalusage inbinning-eukaryotic.pyfor organelles to allow piping and threading. - Updated
BUSCOtov5.8.3and associated databases. - Updated
TiaratoTiara-NALinVEBA-binning-prokaryotic_envandVEBA-binning-eukaryotic_envto enablestdinusage. - Updated
biosynthetic.pyto useantiSMASH v7(Issue #159). - Changed behavior when
--taxon fungiis specified: precomputed genes are not used due to formatting issues. - Simplified the method for adding headers to
Diamondoutputs inbiosynthetic.py. - Changed
Dockerfileworking directory from/tmp/to/home/. - Integrated
Tiaraandconsensus_domain_classification.pyinto thebinettestep ofbinning-prokaryotic.py. - Renamed database identifier from
VDBtoVEBA-DB. - Updated
CheckM2andBinetteversions inbinning-prokaryotic.py. - Updated
CheckM2 Diamonddatabase included inVEBA-DB_v9(Issue #154). - Removed usage of precomputed genes in the
SemiBin2wrapper due to SemiBin2/issue-#185. - Allowed faulty return codes in iterative mode for
binetteto permit convergence in genome recovery.
Fixed
- Fixed
CONDA_ENVS_PATHdetection in thevebacontroller executable to correctly handle environments outside the base Conda directory. - Fixed bug where
VFDBhits were incorrectly counted asMIBiGinbiosynthetic.py(Issue #141). - Fixed
--tta_thresholdargument inbiosynthetic.pywhich was previously defined but not connected to the command execution. - Removed capitalization from column headers in
filter_binette_results.pyoutput. - Fixed missing
--antismash_optionsargument connection inbiosynthetic.py.
Removed
- Removed
CONCOCTsupport frombinning-eukaryotic.py.
Deprecated
- Deprecated
amplicon.pymodule in favor of external pipelines likenf-core/ampliseq.
VEBA_v2.4.2
v2.4.2 fixed a small bug where de bruijn graph for MEGAHIT wasn't included in output directory if the graph was created
[2025.2.1] - Added --megahit_build_de_bruijn_graph to make de-Bruijn graph construction for MEGAHIT optional in assembly.py
VEBA_v2.4.1
- [2025.2.1] - Added
--megahit_build_de_bruijn_graphto make de-Bruijn graph construction forMEGAHIToptional inassembly.py
VEBA_v2.4.0
- [2025.1.24] - Added
Initial_binstoBinetteresults infilter_binette_results.py - [2025.1.23] - Added
essentials.pymodule - [2025.1.16] - Added
--serialized_annotationstoappend_annotations_to_gff.pyto avoid overhead from reparsing the annotations - [2025.1.15] - Fixed bug in
binning_wrapper.pywhere script was looking for bins in the wrong directory forMetaCoAG - [2025.1.14] - Fixed bug in
merge_annotations.pywherediamondoutputs were queried incorrectly - [2025.1.5] - Change default
--busco_completenessfrom50to30inbinning-eukaryotic.py - [2025.1.5] - Added
--busco_optionsand--busco_offlinearguments forbinning-eukaryotic.py - [2024.12.28] - Added
--semibin2_sequencing_typetobinning_wrapper.pyand added functionality for--long_reads. Moved--long_readsargument toparser_ioinstead ofparser_featurecounts - [2024.12.27] - Fixed issue in
consensus_domain_classification.pywheresoftmaxreturns anp.arrayinstead of apd.DataFrame - [2024.12.26] - Added support for precomputed coverage for
metadecoderinbinning_wrapper.py - [2024.12.26] - Added support for
binetteandtiarain updatedbinning_prokaryotic.pymodule - [2024.12.23] - Added
copy_attribute_in_gff.pyscript which copies attributes to a source and destination attribute - [2024.12.17] - Added
filter_binette_results.pyscript - [2024.12.16] - Added intermediate directory to
metacoaginbinning_wrapper.py - [2024.12.12] - Added
metacoagsupport and custom HMM support tometadecoderinbinning_wrapper.py - [2024.12.11] - Added
prepend_de-bruijn_path.pyscript and use this inassembly.pyandassembly-long.pyto prepend prefix to SPAdes/Flye de Bruijn graph paths. - [2024.12.10] - Changed default
--minimum_genome_sizeto200000from150000 - [2024.12.9] - Added support for
SemiBin2andMetaDecoderinbinning_wrapper.py - [2024.11.21] - Updated
--cluster_label_modedefault tomd5instead ofnumericto allow for easier cluster updates post hoc. Change reflected incluster.py,global_clustering.py,local_clustering.py, andupdate_genome_clusters.py - [2024.11.18] - Added
update_genome_clusters.pywhich runsskaniagainst all reference genome clusters. Does not do protein clustering nor does it update the graph, representatives, or proteins. - [2024.11.15] - Added
--header simpletodiamondoutput inannotate.pyand accounted for change inmerge_annotations.py - [2024.11.11] - Added
Enzymestoappend_annotations_to_gff.pyscript - [2024.11.9] - Added
kofam.enzymes.listandkofam.pathways.listinVDB_v8.1to provide subsets forpykofamsearch - [2024.11.8] - Updating VEBA database
VDB_v8toVDB_v8.1which adds serialized KOfam with enzyme support - [2024.11.8] - Added
Enzymestoannotate.pyandmerge_annotations.py[!untested] - [2024.11.7] - Updated
pyhmmsearchandpykofamsearchversion inVEBA-annotate_env.yml,VEBA-classify-eukaryotic_env.yml,VEBA-database_env, andVEBA-phylogeny_env. Also updated executables inannotate.py,classify-eukaryotic.py,phylogeny.py, anddownload_databases-annotate.sh. - [2024.11.7] - In
edgelist_to_clusters.py, added--cluster_label_mode {"numeric", "random", "pseudo-random", "md5", "nodes"}to allow for different types of labels. Added--threshold2option for a second weight. - [2024.11.7] - Added
--wraptofasta_utility.pyand split id and descriptions in header so prefix/suffix is only added to id. - [2024.11.7] - Added
prepend_gff.pyto prepend a prefix to contig and attribute identifiers - [2024.11.7] - Changed default
--skani_minimum_afto50from15as this is used in GTDB-Tk for determining species-level clusters incluster.py,global_clustering.py, andlocal_clustering.py - [2024.11.6] - Added
append_annotations_to_gff.pyscript - [2024.10.29] - Changed
manualmode tometaeukmode for preexistingmetaeukresults
VEBA_v2.3.0
- [2024.9.21] - Added
KEGG Pathway ProfilertoVEBA-database_envandVEBA-annotate_envwhich replacesMicrobeAnnotator-KEGGfor module completion ratios. Replacing${VEBA_DATABASE}/Annotate/MicrobeAnnotator-KEGGwith${VEBA_DATABASE}/Annotate/KEGG-Pathway-Profiler/database files. Note: New module completion ratio output does not have classes labels for KEGG modules. - [2024.8.30] - Added ${N_JOBS} to download scripts with default set to maximum threads available
VEBA_v2.2.1
- [2024.8.29] - Added
VERSIONfile created indownload_databases.sh - [2024.7.11] - Alignment fraction threshold for genome clustering only applied to reference but should also apply to query. Added
--af_modewith eitherrelaxed = max([Alignment_fraction_ref, Alignment_fraction_query]) > minimum_aforstrict = (Alignment_fraction_ref > minimum_af) & (Alignment_fraction_query > minimum_af)toedgelist_to_clusters.py,global_clustering.py,local_clustering.py, andcluster.py. - [2024.7.3] - Added
pigztoVEBA-annotate_envwhich isn't a problem with mostcondainstallations but needed fordockercontainers. - [2024.6.21] - Changed
choose_fastest_mirror.pytodetermine_fastest_mirror.py - [2024.6.20] - Added
-m/--include_mrnatocompile_metaeuk_identifiers.pyfor Issue #110
VEBA_v2.2.0
Disclaimer:
I made some large updates in this version and I believe everything has been adequately tested but just in case anything has slipped between the cracks you can use v2.1.0 which has been thoroughly tested in accordance to the NAR Espinoza 2024 paper. Benefits of using this version include much faster and robust prokaryotic classifications and fast/scalable HMM-based annotation modeling.
Large performance updates for this version including:
- Updating GTDB-Tk 2.3.0 -> 2.4.0 which means the GTDB needed to be updated from r214.1 -> r220
- VEBA-classify_env was split up into VEBA-classify-eukaryotic_env, VEBA-classify-prokaryotic_env, and VEBA-prokaryotic_env
- annotate.py, classify-eukaryotic.py, and phylogeny.py were rewritten (and their utility scripts) were updated to used PyHMMER (pyhmmsearch and pykofamsearch) which is faster than HMMSearch when multithreaded.
- KOFAM was changed to KOfam
NOTE: Please don't use the tar.gz as it contains the 2.1.0 version for some reason:
VERSION="2.2.0"
# wget https://github.com/jolespin/veba/archive/refs/tags/v${VERSION}.tar.gz # The .tar.gz is out of date in this release
# tar -xvf v${VERSION}.tar.gz && mv veba-${VERSION} veba
# Alternative download
wget https://github.com/jolespin/veba/releases/download/v${VERSION}/v${VERSION}.zip
unzip -d veba v${VERSION}.zip
VEBA_v2.1.0-zen
This is the exact same version as VEBA_v2.1.0. New VEBA releases will now automatically be synced to Zenodo.
VEBA_v2.1.0
Official release of VEBA v2.1.0 with updates to address peer reviewers. Mostly documentation but also including the following:
- [2024.4.30] - Added
concatenate_files.pywhich can concatenate files (and mixed compressed/decompressed files) using either arguments, list file, or glob. Reason for this is that unix has a limit of arguments that can be used (e.g.,cat *.fasta > output.fastawhere *.fasta results in 50k files will crash) - [2024.4.29] - Added
/volumes/workspace/directory to Docker containers for situations when your input and output directories are the same. - [2024.4.29] -
featureCountscan only handle 64 threads at a time so addedmin(64, opts.n_jobs)for all the modules/scripts that usefeatureCountscommands. - [2024.4.23] - Added
uniprot_to_enzymes.pywhich reformats tables and fasta from https://www.uniprot.org/uniprotkb?query=ec%3A* - [2024.4.18] - Developed a faster CLI implementation of
KofamScancalledPyKofamSearchwhich leveragePyHmmer. This will be used in future versions of VEBA. - [2024.4.18] - Developed a faster CLI implementation of
HMMSearchcalledPyHMMSearchwhich leveragePyHmmer. This will be used in future versions of VEBA. - [2024.3.26] - Added
--metaeuk_split_memory_limittometaeuk_wrapper.py. - [2024.3.26] - Added
-d/--genome_identifier_directory_indextoscaffolds_to_bins.pyfor directories that are structuredpath/to/genomes/bin_a/reference.fastawhere you would use-d -2. - [2024.3.26] - Added
--minimum_aftoedgelist_to_clusters.pywith an option to accept 4 column inputs[id_1]<tab>[id_2]<tab>[weight]<tab>[alignment_fraction].global_clustering.py,local_clustering.py, andcluster.pynow use this by default--af_threshold 30.0. If you want to retain previous behavior, just use--af_threshold 0.0. - [2024.3.18] -
edgelist_to_clusters.pyonly includes edges where both nodes are in identifiers set. If--identifiersare provided, then only those identifiers are used. If not, then it includes all nodes. - [2024.3.18] - Added
--export_representativesargument foredgelist_to_clusters.pyto output table with[id_node]<tab>[id_cluster]<tab>[intra-cluster_connectivity]<tab>[representative]. Also includes this information innx.Graphobjects. - [2024.3.18] - Changed singleton weight to
np.naninstead ofnp.infforedgelist_to_clusters.pyto allow for representative calculations. - YouTube channel (https://www.youtube.com/@VEBA-Multiomics)