Skip to content

Releases: broadinstitute/viral-pipelines

v2.1.33.1

06 Dec 21:32
f009d86

Choose a tag to compare

What's Changed

new features

  • robustness: allow one retry for gisaid uploader [#395]

optimizations

  • slight speed optimizations to assemble_refbased and demux_deplete [#397]

vm/image updates

  • bump pangolearn docker [#394]
  • bump vadr docker (new models for sc2) [#396]

contributors

@dpark01

Full Changelog: v2.1.33.0...v2.1.33.1

v2.1.33.0

28 Nov 01:49
1444803

Choose a tag to compare

What's Changed

new features

  • turn on allowNestedInputs for subworkflows: this exposes to the end-user all task-level optional inputs from subworkflows [#387]
  • expose --print-all-iSNVs option for cross-contamination detection [#386]
  • add update_dbs_now option to pangolin tasks [#392]
  • add generic upload_entities_tsv task [#391]
  • Filter CRSP samples -> biosample registration to only clinical tests (not pooled samples) [#374]
  • CRSP meta ETL: add new body part value [#390]
  • add sample ID list to outputs of nextclade/pangolin many sample tasks [#373]

bug fixes

  • use WDL value for submitting_lab_name in gisaid_meta_prep python body [#376]

optimizations

  • allow maxRetries=2 for most WDL tasks, increasing robustness of execution on Terra [#379, #383]
  • sarscov2_batch_relineage: speed up localization of input data [#375]
  • descatter CDC AWS delivery: copy all raw reads in a single aws S3 copy task instead of hundreds--faster, more reliable, etc [#389]

vm/image updates

  • bump viral-core image and remove in-WDL workaround [#377]
  • update pangolin docker [#380, #385, #388]
  • update nextstrain docker to most recent tag (20211012 ) and update ncov commit to v9 release [#378]
  • increase mem of utils.tsv_join task 7GB->15GB, allow maxRetries=2 [#379]
  • increase default memory for tsv_join 7->32gb [#382]
  • more ram and disk for sc2 reports [#384]
  • VM tuning of nextclade & pangolin many-sample tasks based on measured usage of a typical 768-sample flowcell [#373]

build updates

  • github actions updates [#381]
  • dnanexus build fixes [#372]

Contributors

@dpark01 @tomkinsc @lakras

Full Changelog: v2.1.32.4...v2.1.33.0

v2.1.32.4

02 Oct 00:54
65eb6fc

Choose a tag to compare

new features:

  • nextclade_version output string now includes nextclade datasets "tag" (version/date) [#371]
  • implement nextclade_multi_sample and pangolin_multi_sample with Map task outputs, switch sarscov2_batch_relineage and sarscov2_illumina_full to use multi_sample pangolin and nextclade tasks to increase compute efficiency and reduce shard counts [#368]

bug fixes:

  • rename detect_cross_contamination task wdl to be distinct from workflow name to fix dxWDL builds [#370]

vm/image updates:

  • update pangolin 3.1.11 to 3.1.14, update pangolearn 2021-09-17 to 2021-09-28, update nextclade 1.2.3 to 1.4.0 [#371]

v2.1.32.3

28 Sep 02:21
f5c0a0d

Choose a tag to compare

improvements:

  • sarscov2_biosample_load workflow: stop using today's date in constructing ftp directory path for NCBI BioSample submissions in order to allow call caching for jobs run on different days [#366]

bug fixes:

  • fix for Array[Array[String]] alerts output variable from vadr task (update to new vadr output format) [#363]
  • edge case bug fix for nextstrain subsampling keep_list (was always mangling the first entry of a user-specified keep list) [#364]

Broad-specific:

  • add more external lab names to task crsp_meta_etl [#367]

minor VM/docker changes:

  • VM shape changes in nextstrain pipeline [#362]
  • pangolearn image update [#365]

v2.1.32.2

05 Sep 21:52
aad632b

Choose a tag to compare

  • bugfix: Broad dashboard output file should be txt not tsv [#360]
  • docker update to pangolearn 2021-08-24 [#359]
  • docker update to sc2-rmd:0.1.25 [#361]

v2.1.32.1

29 Aug 00:40
ff86ba2

Choose a tag to compare

bugfixes:

  • workflow sarscov2_nextstrain and sarscov2_nextstrain_aligned_input: bug fix to DAG -- ensure that treetime and ancestral inference are using masked alignments, not unmasked alignments [#358]
  • task crsp_meta_etl: add more possible values to the controlled vocabulary options for body_part [#357]

v2.1.32.0

18 Aug 23:07
550a71e

Choose a tag to compare

new features:

  • most task runtime blocks now support cromwell auto memory scaling/retry
  • automated data release and delivery sarscov2_data_release
  • batch recalling of pango/nextclade lineages sarscov2_batch_relineage
  • improved automated BioSample registration and metadata handling from Broad CRSP samples and external non-Broad samples via GP pipeline
  • add sarscov2_biosample_load as optional subworkflow call at the beginning of sarscov2_illumina_full for fully automated use by Terra workflow launcher
  • updated/improved Picard-based illumina demux
  • move state public health reporting from sarscov2_illumina_full to sarscov2_data_release

bug fixes:

minor updates to docker images and vm shapes:

  • pangolin 3.1.11 / pangolearn 2021-08-09
  • nextclade 1.2.3
  • vadr 1.3
  • nextstrain 20210413T201712Z
  • sc2-rmd, viral-core

build changes:

  • GitHub Actions CI added, now primary. Travis CI still active at the moment

v2.1.28.0

01 May 03:40
e1b71c2

Choose a tag to compare

new features:

  • new workflow sarscov2_nextstrain [#204, #208, #219]
  • updates to genbank submission [#201, #207]
  • update vadr alert criteria based on NCBI recommendations [#234, #249]
  • add nextclade tree outputs to sarscov2_illumina_full [#233]
  • add sequencing reports via rmarkdown (sarscov2_illumina_full) [#222, #226, #228, #235, #236, #244, #245, #248, #265]
  • ivar trim updates: emit ivar trim stats (assemble_refbased) and compute summary stats (sarscov2_illumina_full) [#237]
  • terra table upload and download [#206, #241]
  • add picard wgs metrics, alignment metrics, and insert size metrics to assemble_refbased and sarscov2_illumina_full [#239, #282]
  • add bucket delivery of data for CDC, SRA, and GP reporting to sarscov2_illumina_full [#258, #263, #278]
  • add tasks and workflows for NCBI BioSample registration and metadata retrieval [#279]
  • automated filtering of libraries from failed NTC controls [#266]

bug fixes:

  • bugfix whitespace handling in gzcat task [#230]
  • deduplicate output rows from sra_meta_prep [#220]
  • GISAID metadata output should be CSV not TSV [#273]
  • derive Illumina run ID from XML instead of tarball filename [#275]

minor updates to docker images and vm shapes:

build changes:

  • bump cromwell and womtool 54 to 61 [#272]
  • temporarily drop dnanexus builds from Travis until we clean up the dnanexus CI project [#280, #283]

v2.1.19.0

26 Jan 14:58
ff81708

Choose a tag to compare

Added new workflow: sarscov2_sra_to_genbank -- this takes sequencing reads from INSDC (via NCBI SRA), assembles, annotates, and QCs genomes, and produces Genbank and GISAID submission bundles based on the metadata in NCBI (SRA and BioSample). The Genbank submission will be tied to the same source BioProject and BioSamples that the reads were linked to in SRA. This workflow is able to merge together multiple read sets (SRA records) from the same BioSample and produce one assembly per BioSample. It will automatically detect sequencing platform (only Illumina and Oxford Nanopore currently supported) as well as amplicon vs metagenomic library designs based on the SRA metadata, and assemble appropriately. This has been tested on Illumina reads, ONT reads, amplicon libraries, metagenomic libraries, reads submitted to NCBI SRA, and reads originally submitted to ENA and synced with NCBI. [#197, #200]

Minor changes and fixes to sarscov2_illumina_full:

  • filter genbank/gisaid submission packages to only sequences present in biosample attributes file [#200]
  • relax minimum genome unambig bp cutoff from 20kb to 15kb [#200]
  • allow for merging multiple biosample attributes tsvs together in sarscov2_illumina_full [#200]
  • add "Sequencing Technology" column to both genbank and gisaid submission packages [#200]
  • greatly simplify the final assembly metrics metadata output from both workflows (single tsv instead of compound array structures) [#200]
  • makes filename outputs a bit more organized [#200]
  • exposes cleaned_bam_uris text file output for easy SRA submission [#200]
  • replace the first several steps with an invocation of demux_deplete as a subworkflow to reduce code duplication [#197]

Other minor changes:

  • sarscov2_lineages and sarscov2_illumina_full: rename output variable pangolin_clade to pango_lineage to stay in line with the nomenclature of the PANGOLIN authors. [#197]
  • increase default RAM for GATK UG consensus calling in assemble_refbased from 7GB to 15GB. [#200]
  • bump nextclade image and pangoLEARN database to latest [#198]. nextclade update improves deletion variant naming. pangolin update keeps up with latest lineage assignments.
  • bump viral-core docker 2.1.18 to 2.1.19 to fix demux scenario with single-index/paired-reads [#199]

v2.1.18.0

17 Jan 00:59
799e054

Choose a tag to compare

New general workflows:

  • new workflow demux_deplete. This sits between demux_only (demux and fastqc) and demux_plus (which adds kraken, spades, etc) and just does demux, fastqc, and depletion. If optionally given "augmented" samplesheets and NCBI BioSample mappings, it will produce SRA submission bundles as well. [#191]
  • new workflow mafft_and_snp_annotated, which adds snpEff annotation to the snp-sites output [#194]

New SARS-CoV-2 specific workflows. Up until this release, all included workflows were generally applicable to most viral taxa. This release includes a number of single-taxon workflows exclusively for SARS-CoV-2 in order to increase efficiency for high throughput work on this one virus.

  • new workflow sarscov2_illumina_full is a full end-to-end workflow from Illumina BCL tarball through Genbank, SRA, and GISAID submission bundles. It wraps together demux_deplete, assemble_refbased, sarscov2_lineages, sarscov2_genbank. It requires the user to pre-register NCBI BioSample entries and to provide an "augmented" samplesheet for demux. [#191, #196]

  • new workflow sarscov2_genbank. Prepares single-segmented genome assemblies for submission to NCBI Genbank using their new SARS-CoV-2 submission mechanism (which may become more mainstream for other viruses as well). Incorporates the new VADR (Viral Annotation DefineR) tool from NCBI to annotate (produce tbl files) and QC (flag frameshift and other problems) using the same settings that Genbank uses for QC -- this filters out genomes from submission that fail VADR QC and should result in Genbank submissions with no rejections. [#191]

  • new workflow sarscov2_lineages and sarscov2_nextclade_multi. Runs Nextclade and Pangolin to do lineage/clade classification on SARS-CoV-2 genomes. [#184, #185, #186]

  • nextstrain/augur workflow improvements and bugfixes to allow for merging of multiple metadata tsv files. This simplifies the process of regular builds where some data is changing frequently [#189, #181, #191]

  • docker image updates: [#195, #190, #187, #182, #191, #193]

  • VM shape updates [#188]

  • README update with diagram [#183, @llangit-broad]