Releases: broadinstitute/viral-pipelines
v2.1.33.1
What's Changed
new features
- robustness: allow one retry for gisaid uploader [#395]
optimizations
- slight speed optimizations to assemble_refbased and demux_deplete [#397]
vm/image updates
contributors
Full Changelog: v2.1.33.0...v2.1.33.1
v2.1.33.0
What's Changed
new features
- turn on allowNestedInputs for subworkflows: this exposes to the end-user all task-level optional inputs from subworkflows [#387]
- expose --print-all-iSNVs option for cross-contamination detection [#386]
- add update_dbs_now option to pangolin tasks [#392]
- add generic upload_entities_tsv task [#391]
- Filter CRSP samples -> biosample registration to only clinical tests (not pooled samples) [#374]
- CRSP meta ETL: add new body part value [#390]
- add sample ID list to outputs of nextclade/pangolin many sample tasks [#373]
bug fixes
- use WDL value for submitting_lab_name in gisaid_meta_prep python body [#376]
optimizations
- allow maxRetries=2 for most WDL tasks, increasing robustness of execution on Terra [#379, #383]
- sarscov2_batch_relineage: speed up localization of input data [#375]
- descatter CDC AWS delivery: copy all raw reads in a single aws S3 copy task instead of hundreds--faster, more reliable, etc [#389]
vm/image updates
- bump viral-core image and remove in-WDL workaround [#377]
- update pangolin docker [#380, #385, #388]
- update nextstrain docker to most recent tag (20211012 ) and update ncov commit to v9 release [#378]
- increase mem of utils.tsv_join task 7GB->15GB, allow maxRetries=2 [#379]
- increase default memory for tsv_join 7->32gb [#382]
- more ram and disk for sc2 reports [#384]
- VM tuning of nextclade & pangolin many-sample tasks based on measured usage of a typical 768-sample flowcell [#373]
build updates
Contributors
Full Changelog: v2.1.32.4...v2.1.33.0
v2.1.32.4
new features:
- nextclade_version output string now includes nextclade datasets "tag" (version/date) [#371]
- implement nextclade_multi_sample and pangolin_multi_sample with Map task outputs, switch sarscov2_batch_relineage and sarscov2_illumina_full to use multi_sample pangolin and nextclade tasks to increase compute efficiency and reduce shard counts [#368]
bug fixes:
- rename detect_cross_contamination task wdl to be distinct from workflow name to fix dxWDL builds [#370]
vm/image updates:
- update pangolin 3.1.11 to 3.1.14, update pangolearn 2021-09-17 to 2021-09-28, update nextclade 1.2.3 to 1.4.0 [#371]
v2.1.32.3
improvements:
sarscov2_biosample_loadworkflow: stop using today's date in constructing ftp directory path for NCBI BioSample submissions in order to allow call caching for jobs run on different days [#366]
bug fixes:
- fix for
Array[Array[String]] alertsoutput variable from vadr task (update to new vadr output format) [#363] - edge case bug fix for nextstrain subsampling keep_list (was always mangling the first entry of a user-specified keep list) [#364]
Broad-specific:
- add more external lab names to task
crsp_meta_etl[#367]
minor VM/docker changes:
v2.1.32.2
v2.1.32.1
bugfixes:
v2.1.32.0
new features:
- most task runtime blocks now support cromwell auto memory scaling/retry
- automated data release and delivery
sarscov2_data_release - batch recalling of pango/nextclade lineages
sarscov2_batch_relineage - improved automated BioSample registration and metadata handling from Broad CRSP samples and external non-Broad samples via GP pipeline
- add
sarscov2_biosample_loadas optional subworkflow call at the beginning ofsarscov2_illumina_fullfor fully automated use by Terra workflow launcher - updated/improved Picard-based illumina demux
- move state public health reporting from
sarscov2_illumina_fulltosarscov2_data_release
bug fixes:
minor updates to docker images and vm shapes:
- pangolin 3.1.11 / pangolearn 2021-08-09
- nextclade 1.2.3
- vadr 1.3
- nextstrain 20210413T201712Z
- sc2-rmd, viral-core
build changes:
- GitHub Actions CI added, now primary. Travis CI still active at the moment
v2.1.28.0
new features:
- new workflow sarscov2_nextstrain [#204, #208, #219]
- updates to genbank submission [#201, #207]
- update vadr alert criteria based on NCBI recommendations [#234, #249]
- add nextclade tree outputs to sarscov2_illumina_full [#233]
- add sequencing reports via rmarkdown (sarscov2_illumina_full) [#222, #226, #228, #235, #236, #244, #245, #248, #265]
- ivar trim updates: emit ivar trim stats (assemble_refbased) and compute summary stats (sarscov2_illumina_full) [#237]
- terra table upload and download [#206, #241]
- add picard wgs metrics, alignment metrics, and insert size metrics to assemble_refbased and sarscov2_illumina_full [#239, #282]
- add bucket delivery of data for CDC, SRA, and GP reporting to sarscov2_illumina_full [#258, #263, #278]
- add tasks and workflows for NCBI BioSample registration and metadata retrieval [#279]
- automated filtering of libraries from failed NTC controls [#266]
bug fixes:
- bugfix whitespace handling in gzcat task [#230]
- deduplicate output rows from sra_meta_prep [#220]
- GISAID metadata output should be CSV not TSV [#273]
- derive Illumina run ID from XML instead of tarball filename [#275]
minor updates to docker images and vm shapes:
- vm shape updates on augur steps [#205, #224, #225, #229, #232]
- bump viral-core docker [#242, #243, #268, #270, #274, #276, #281, #284]
- bump ivar docker [#209]
- bump pangolin/pangolearn [#203, #205, #210, #213, #214, #215, #216, #217, #218, #240, #250, #254, #267, #271, #285]
- bump vadr docker [#216, #264]
- bump nextstrain/base [#238]
- bump sc2-rmd docker [#269, #277]
- update nextmeta tsv output behavior to match new nextmeta spellings [#231]
build changes:
v2.1.19.0
Added new workflow: sarscov2_sra_to_genbank -- this takes sequencing reads from INSDC (via NCBI SRA), assembles, annotates, and QCs genomes, and produces Genbank and GISAID submission bundles based on the metadata in NCBI (SRA and BioSample). The Genbank submission will be tied to the same source BioProject and BioSamples that the reads were linked to in SRA. This workflow is able to merge together multiple read sets (SRA records) from the same BioSample and produce one assembly per BioSample. It will automatically detect sequencing platform (only Illumina and Oxford Nanopore currently supported) as well as amplicon vs metagenomic library designs based on the SRA metadata, and assemble appropriately. This has been tested on Illumina reads, ONT reads, amplicon libraries, metagenomic libraries, reads submitted to NCBI SRA, and reads originally submitted to ENA and synced with NCBI. [#197, #200]
Minor changes and fixes to sarscov2_illumina_full:
- filter genbank/gisaid submission packages to only sequences present in biosample attributes file [#200]
- relax minimum genome unambig bp cutoff from 20kb to 15kb [#200]
- allow for merging multiple biosample attributes tsvs together in
sarscov2_illumina_full[#200] - add "Sequencing Technology" column to both genbank and gisaid submission packages [#200]
- greatly simplify the final assembly metrics metadata output from both workflows (single tsv instead of compound array structures) [#200]
- makes filename outputs a bit more organized [#200]
- exposes cleaned_bam_uris text file output for easy SRA submission [#200]
- replace the first several steps with an invocation of
demux_depleteas a subworkflow to reduce code duplication [#197]
Other minor changes:
sarscov2_lineagesandsarscov2_illumina_full: rename output variablepangolin_cladetopango_lineageto stay in line with the nomenclature of the PANGOLIN authors. [#197]- increase default RAM for GATK UG consensus calling in
assemble_refbasedfrom 7GB to 15GB. [#200] - bump nextclade image and pangoLEARN database to latest [#198]. nextclade update improves deletion variant naming. pangolin update keeps up with latest lineage assignments.
- bump viral-core docker 2.1.18 to 2.1.19 to fix demux scenario with single-index/paired-reads [#199]
v2.1.18.0
New general workflows:
- new workflow
demux_deplete. This sits betweendemux_only(demux and fastqc) anddemux_plus(which adds kraken, spades, etc) and just does demux, fastqc, and depletion. If optionally given "augmented" samplesheets and NCBI BioSample mappings, it will produce SRA submission bundles as well. [#191] - new workflow
mafft_and_snp_annotated, which adds snpEff annotation to the snp-sites output [#194]
New SARS-CoV-2 specific workflows. Up until this release, all included workflows were generally applicable to most viral taxa. This release includes a number of single-taxon workflows exclusively for SARS-CoV-2 in order to increase efficiency for high throughput work on this one virus.
-
new workflow
sarscov2_illumina_fullis a full end-to-end workflow from Illumina BCL tarball through Genbank, SRA, and GISAID submission bundles. It wraps togetherdemux_deplete,assemble_refbased,sarscov2_lineages,sarscov2_genbank. It requires the user to pre-register NCBI BioSample entries and to provide an "augmented" samplesheet for demux. [#191, #196] -
new workflow
sarscov2_genbank. Prepares single-segmented genome assemblies for submission to NCBI Genbank using their new SARS-CoV-2 submission mechanism (which may become more mainstream for other viruses as well). Incorporates the new VADR (Viral Annotation DefineR) tool from NCBI to annotate (produce tbl files) and QC (flag frameshift and other problems) using the same settings that Genbank uses for QC -- this filters out genomes from submission that fail VADR QC and should result in Genbank submissions with no rejections. [#191] -
new workflow
sarscov2_lineagesandsarscov2_nextclade_multi. Runs Nextclade and Pangolin to do lineage/clade classification on SARS-CoV-2 genomes. [#184, #185, #186] -
nextstrain/augur workflow improvements and bugfixes to allow for merging of multiple metadata tsv files. This simplifies the process of regular builds where some data is changing frequently [#189, #181, #191]
-
VM shape updates [#188]
-
README update with diagram [#183, @llangit-broad]