Releases · broadinstitute/viral-pipelines

06 Dec 21:32

dpark01

v2.1.33.1

f009d86

v2.1.33.1

What's Changed

new features

robustness: allow one retry for gisaid uploader [#395]

optimizations

slight speed optimizations to assemble_refbased and demux_deplete [#397]

vm/image updates

bump pangolearn docker [#394]
bump vadr docker (new models for sc2) [#396]

contributors

@dpark01

Full Changelog: v2.1.33.0...v2.1.33.1

Contributors

dpark01

Assets 2

28 Nov 01:49

dpark01

v2.1.33.0

1444803

v2.1.33.0

What's Changed

new features

turn on allowNestedInputs for subworkflows: this exposes to the end-user all task-level optional inputs from subworkflows [#387]
expose --print-all-iSNVs option for cross-contamination detection [#386]
add update_dbs_now option to pangolin tasks [#392]
add generic upload_entities_tsv task [#391]
Filter CRSP samples -> biosample registration to only clinical tests (not pooled samples) [#374]
CRSP meta ETL: add new body part value [#390]
add sample ID list to outputs of nextclade/pangolin many sample tasks [#373]

bug fixes

use WDL value for submitting_lab_name in gisaid_meta_prep python body [#376]

optimizations

allow maxRetries=2 for most WDL tasks, increasing robustness of execution on Terra [#379, #383]
sarscov2_batch_relineage: speed up localization of input data [#375]
descatter CDC AWS delivery: copy all raw reads in a single aws S3 copy task instead of hundreds--faster, more reliable, etc [#389]

vm/image updates

bump viral-core image and remove in-WDL workaround [#377]
update pangolin docker [#380, #385, #388]
update nextstrain docker to most recent tag (20211012 ) and update ncov commit to v9 release [#378]
increase mem of utils.tsv_join task 7GB->15GB, allow maxRetries=2 [#379]
increase default memory for tsv_join 7->32gb [#382]
more ram and disk for sc2 reports [#384]
VM tuning of nextclade & pangolin many-sample tasks based on measured usage of a typical 768-sample flowcell [#373]

build updates

github actions updates [#381]
dnanexus build fixes [#372]

Contributors

@dpark01 @tomkinsc @lakras

Full Changelog: v2.1.32.4...v2.1.33.0

Contributors

tomkinsc, lakras, and dpark01

Assets 2

02 Oct 00:54

dpark01

v2.1.32.4

65eb6fc

v2.1.32.4

new features:

nextclade_version output string now includes nextclade datasets "tag" (version/date) [#371]
implement nextclade_multi_sample and pangolin_multi_sample with Map task outputs, switch sarscov2_batch_relineage and sarscov2_illumina_full to use multi_sample pangolin and nextclade tasks to increase compute efficiency and reduce shard counts [#368]

bug fixes:

rename detect_cross_contamination task wdl to be distinct from workflow name to fix dxWDL builds [#370]

vm/image updates:

update pangolin 3.1.11 to 3.1.14, update pangolearn 2021-09-17 to 2021-09-28, update nextclade 1.2.3 to 1.4.0 [#371]

Assets 2

28 Sep 02:21

dpark01

v2.1.32.3

f5c0a0d

v2.1.32.3

improvements:

sarscov2_biosample_load workflow: stop using today's date in constructing ftp directory path for NCBI BioSample submissions in order to allow call caching for jobs run on different days [#366]

bug fixes:

fix for Array[Array[String]] alerts output variable from vadr task (update to new vadr output format) [#363]
edge case bug fix for nextstrain subsampling keep_list (was always mangling the first entry of a user-specified keep list) [#364]

Broad-specific:

add more external lab names to task crsp_meta_etl [#367]

minor VM/docker changes:

VM shape changes in nextstrain pipeline [#362]
pangolearn image update [#365]

Assets 2

05 Sep 21:52

dpark01

v2.1.32.2

aad632b

v2.1.32.2

bugfix: Broad dashboard output file should be txt not tsv [#360]
docker update to pangolearn 2021-08-24 [#359]
docker update to sc2-rmd:0.1.25 [#361]

Assets 2

29 Aug 00:40

dpark01

v2.1.32.1

ff86ba2

v2.1.32.1

bugfixes:

workflow sarscov2_nextstrain and sarscov2_nextstrain_aligned_input: bug fix to DAG -- ensure that treetime and ancestral inference are using masked alignments, not unmasked alignments [#358]
task crsp_meta_etl: add more possible values to the controlled vocabulary options for body_part [#357]

Assets 2

18 Aug 23:07

tomkinsc

v2.1.32.0

550a71e

v2.1.32.0

new features:

most task runtime blocks now support cromwell auto memory scaling/retry
automated data release and delivery sarscov2_data_release
batch recalling of pango/nextclade lineages sarscov2_batch_relineage
improved automated BioSample registration and metadata handling from Broad CRSP samples and external non-Broad samples via GP pipeline
add sarscov2_biosample_load as optional subworkflow call at the beginning of sarscov2_illumina_full for fully automated use by Terra workflow launcher
updated/improved Picard-based illumina demux
move state public health reporting from sarscov2_illumina_full to sarscov2_data_release

bug fixes:

minor updates to docker images and vm shapes:

pangolin 3.1.11 / pangolearn 2021-08-09
nextclade 1.2.3
vadr 1.3
nextstrain 20210413T201712Z
sc2-rmd, viral-core

build changes:

GitHub Actions CI added, now primary. Travis CI still active at the moment

Assets 2

01 May 03:40

dpark01

v2.1.28.0

e1b71c2

v2.1.28.0

new features:

new workflow sarscov2_nextstrain [#204, #208, #219]
updates to genbank submission [#201, #207]
update vadr alert criteria based on NCBI recommendations [#234, #249]
add nextclade tree outputs to sarscov2_illumina_full [#233]
add sequencing reports via rmarkdown (sarscov2_illumina_full) [#222, #226, #228, #235, #236, #244, #245, #248, #265]
ivar trim updates: emit ivar trim stats (assemble_refbased) and compute summary stats (sarscov2_illumina_full) [#237]
terra table upload and download [#206, #241]
add picard wgs metrics, alignment metrics, and insert size metrics to assemble_refbased and sarscov2_illumina_full [#239, #282]
add bucket delivery of data for CDC, SRA, and GP reporting to sarscov2_illumina_full [#258, #263, #278]
add tasks and workflows for NCBI BioSample registration and metadata retrieval [#279]
automated filtering of libraries from failed NTC controls [#266]

bug fixes:

bugfix whitespace handling in gzcat task [#230]
deduplicate output rows from sra_meta_prep [#220]
GISAID metadata output should be CSV not TSV [#273]
derive Illumina run ID from XML instead of tarball filename [#275]

minor updates to docker images and vm shapes:

vm shape updates on augur steps [#205, #224, #225, #229, #232]
bump viral-core docker [#242, #243, #268, #270, #274, #276, #281, #284]
bump ivar docker [#209]
bump pangolin/pangolearn [#203, #205, #210, #213, #214, #215, #216, #217, #218, #240, #250, #254, #267, #271, #285]
bump vadr docker [#216, #264]
bump nextstrain/base [#238]
bump sc2-rmd docker [#269, #277]
update nextmeta tsv output behavior to match new nextmeta spellings [#231]

build changes:

bump cromwell and womtool 54 to 61 [#272]
temporarily drop dnanexus builds from Travis until we clean up the dnanexus CI project [#280, #283]

Assets 2

26 Jan 14:58

dpark01

v2.1.19.0

ff81708

v2.1.19.0

Added new workflow: sarscov2_sra_to_genbank -- this takes sequencing reads from INSDC (via NCBI SRA), assembles, annotates, and QCs genomes, and produces Genbank and GISAID submission bundles based on the metadata in NCBI (SRA and BioSample). The Genbank submission will be tied to the same source BioProject and BioSamples that the reads were linked to in SRA. This workflow is able to merge together multiple read sets (SRA records) from the same BioSample and produce one assembly per BioSample. It will automatically detect sequencing platform (only Illumina and Oxford Nanopore currently supported) as well as amplicon vs metagenomic library designs based on the SRA metadata, and assemble appropriately. This has been tested on Illumina reads, ONT reads, amplicon libraries, metagenomic libraries, reads submitted to NCBI SRA, and reads originally submitted to ENA and synced with NCBI. [#197, #200]

Minor changes and fixes to sarscov2_illumina_full:

filter genbank/gisaid submission packages to only sequences present in biosample attributes file [#200]
relax minimum genome unambig bp cutoff from 20kb to 15kb [#200]
allow for merging multiple biosample attributes tsvs together in sarscov2_illumina_full [#200]
add "Sequencing Technology" column to both genbank and gisaid submission packages [#200]
greatly simplify the final assembly metrics metadata output from both workflows (single tsv instead of compound array structures) [#200]
makes filename outputs a bit more organized [#200]
exposes cleaned_bam_uris text file output for easy SRA submission [#200]
replace the first several steps with an invocation of demux_deplete as a subworkflow to reduce code duplication [#197]

Other minor changes:

sarscov2_lineages and sarscov2_illumina_full: rename output variable pangolin_clade to pango_lineage to stay in line with the nomenclature of the PANGOLIN authors. [#197]
increase default RAM for GATK UG consensus calling in assemble_refbased from 7GB to 15GB. [#200]
bump nextclade image and pangoLEARN database to latest [#198]. nextclade update improves deletion variant naming. pangolin update keeps up with latest lineage assignments.
bump viral-core docker 2.1.18 to 2.1.19 to fix demux scenario with single-index/paired-reads [#199]

Assets 2

17 Jan 00:59

dpark01

v2.1.18.0

799e054

v2.1.18.0

New general workflows:

new workflow demux_deplete. This sits between demux_only (demux and fastqc) and demux_plus (which adds kraken, spades, etc) and just does demux, fastqc, and depletion. If optionally given "augmented" samplesheets and NCBI BioSample mappings, it will produce SRA submission bundles as well. [#191]
new workflow mafft_and_snp_annotated, which adds snpEff annotation to the snp-sites output [#194]

New SARS-CoV-2 specific workflows. Up until this release, all included workflows were generally applicable to most viral taxa. This release includes a number of single-taxon workflows exclusively for SARS-CoV-2 in order to increase efficiency for high throughput work on this one virus.

new workflow sarscov2_illumina_full is a full end-to-end workflow from Illumina BCL tarball through Genbank, SRA, and GISAID submission bundles. It wraps together demux_deplete, assemble_refbased, sarscov2_lineages, sarscov2_genbank. It requires the user to pre-register NCBI BioSample entries and to provide an "augmented" samplesheet for demux. [#191, #196]
new workflow sarscov2_genbank. Prepares single-segmented genome assemblies for submission to NCBI Genbank using their new SARS-CoV-2 submission mechanism (which may become more mainstream for other viruses as well). Incorporates the new VADR (Viral Annotation DefineR) tool from NCBI to annotate (produce tbl files) and QC (flag frameshift and other problems) using the same settings that Genbank uses for QC -- this filters out genomes from submission that fail VADR QC and should result in Genbank submissions with no rejections. [#191]
new workflow sarscov2_lineages and sarscov2_nextclade_multi. Runs Nextclade and Pangolin to do lineage/clade classification on SARS-CoV-2 genomes. [#184, #185, #186]
nextstrain/augur workflow improvements and bugfixes to allow for merging of multiple metadata tsv files. This simplifies the process of regular builds where some data is changing frequently [#189, #181, #191]
docker image updates: [#195, #190, #187, #182, #191, #193]
VM shape updates [#188]
README update with diagram [#183, @llangit-broad]

Assets 2

Releases: broadinstitute/viral-pipelines

v2.1.33.1

What's Changed

new features

optimizations

vm/image updates

contributors

Contributors

Uh oh!

v2.1.33.0

What's Changed

new features

bug fixes

optimizations

vm/image updates

build updates

Contributors

Contributors

Uh oh!

v2.1.32.4

Uh oh!

v2.1.32.3

Uh oh!

v2.1.32.2

Uh oh!

v2.1.32.1

Uh oh!

v2.1.32.0

Uh oh!

v2.1.28.0

Uh oh!

v2.1.19.0

Uh oh!

v2.1.18.0

Uh oh!