Releases · broadinstitute/gatk

27 Mar 20:46

droazen

4.0.3.0

6979f37

4.0.3.0

This release brings a major update to our experimental neural-network-based VariantRecalibrator replacement, initial MAF support in Funcotator, as well as some updates to Mutect2 and the CNV tools.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Summary of changes in this release:

A major update to our experimental neural-network-based suite of variant scoring tools, which will eventually replace the VariantRecalibrator (#4245)
- The NeuralNetInferenceTool has been renamed to CNNScoreVariants
- Baseline models are now included in the distribution.
- Added additional tools to write tensors and to train your own models given a VCF of validated calls, an unfiltered VCF and a confident region: CNNVariantTrain, CNNVariantWriteTensors and FilterVariantTranches
- Read-level 2D models are now supported via the tensor-type read_tensor argument. 2D models at present are significantly slower than the 1D models.
Funcotator:
- Added prototype support for outputting MAF files (and many bug fixes) (#4472)
Mutect2:
- CalculateContamination emits its segmentation and Mutect2 germline model uses it (#4509)
- Option to emit (but still filter) all germline sites in Mutect2 (#4522)
- Made number of samples to put variant site in Mutect2 PON adjustable (#4566)
- Added Oncotator filtering enabled in Mutect2 WDL. (#4423)
CNV tools:
- Replaced CollectFragmentCounts with CollectReadCounts. (#4564)
- Allowed use of zero eigensamples in DenoiseReadCounts. (#4411)
- Changed filtering of normal hets on overlap with copy-ratio intervals in ModelSegments to be consistent with filtering of case hets. (#4510)
- Updated PostprocessGermlineCNVCalls (segments VCF writing, WDL scripts, unit tests, integration tests) (#4396)
Miscellaneous changes:
- Concordance: added option to analyze contributions of different filters (#4520)
- Exposed the -pairHMM/--pair-hmm-implementation argument in HaplotypeCaller, which was previously hidden (#4494)
- Set the default samjdk.compression_level to 2 (was previously 1) (#4547)
- Upgraded to Spark 2.2.0 (#4314)
- Changed Spark sharding of queryname-sorted bams to better handle secondary and supplementary reads (#4473)
- Added logging output to the bam writing step for spark tools (#4501)
- git-lfs is now required to compile the GATK
- Added a registry for deprecated/unported tools. (#4505)
- Updated the Hadoop GCS connector from 1.6.1 to 1.6.3. (#4590)
- Added a large runtime resource directory to git-lfs, and exposed it to the Docker build. (#4530)
- We now include full tool documentation in the GATK binary distribution zip (#4377)
- Made our maven artifacts much smaller by preventing gradle uploadArchives from including distZip and distTar (#4569)
- Added chr20 and chr21 alt contigs to the GRCh38 reference snippet used for testing (#4548)

Assets 3

02 Mar 19:41

droazen

4.0.2.1

8a78790

4.0.2.1

This is a small bug fix release containing fixes for the following issues:

HaplotypeCaller: fix the -contamination/-contamination-file arguments, which were not working properly, and add tests (#4455)
Fixes/improvements to the GATK configuration file mechanism (#4445)
- If a Java system property is specified explicitly on the user's command line, allow it to override the corresponding value in the GATK config file
- Bundle an example GATK configuration file with the GATK binary distribution. This config file can be edited and passed to the GATK via the --gatk-config-file argument.
- There are still some configuration-related TODOs/known issues: in particular, the gatk front-end script currently bakes in some system properties internally, which will always override the corresponding values in the config file. We plan to patch the gatk script to no longer set these system properties internally, and delegate to the config file instead.
Mutect2: minor bug fixes and improvements (#4466)
- Fix "FilterMutectCalls trips on non-int value in MFRL tag" (#4363)
- Fix ordering of allele trimming vs. variant annotation (#4402)
- Fix "CalculateContamination gives >100% results" (#3889)
- Disable the MateOnSameContigOrNoMappedMateReadFilter by default (#3514)
- Make mapping quality threshold in GetPileupSummaries modifiable (#4011)
SV Tools: Add a scan for intervals of high depth, and exclude reads from those regions from SV evidence (#4438)
In the GATK docker image, run the GATK using the fully-packaged binary distribution jars, rather than the unpackaged jars (#4476). This fixes a number of minor issues reported by users of the docker image.

Assets 3

27 Feb 03:27

lbergelson

4.0.2.0

2bb2f50

4.0.2.0

This is a small release that includes a new Beta tool, a port of VariantAnnotator from Gatk3, as well as some bug fixes and other improvements. Mutect2 is no longer beta.

Mutect2 and FilterMutectCalls are now no longer beta! (#4384)
new tool VariantAnnotator (#3803):
- ported tool from GATK3
- first beta release
Spark Improvements:
- fix a major performance regression that harmed performance of spark tools (#4428)
- SortReadFileSpark renamed -> SortSamSpark (#4442)
- minor improvements to Kryo registration (#4451)
new CNV Tumor only WDL (#4414)
Viterbi segmentation and segment quality calculation for gcnvkernel (#4335)
Other Bug Fixes and Improvements:
- update to latest GKL, improves performance of GZIP level 2 compression (#4379)
- CalculateGenotypePosteriors fixed bug that caused duplicates in the output VCF as well as several other issues (#4352, #4431)
- Display a more prominent warning message for Beta and Experimental tools. (#4429)
- non-zero Picard tool exit codes now cause a non-zero exit from gatk (#4437)
- removed support for deprecated Google Reference API (#4266)
- Improve evidence info dumps and SV pipeline management (#4385)
- oncotator docker uses default docker if not specified (#4394)
- Added check for non-finite copy ratios in ModelSegments pipeline. (#4292)
- make FASTQ reader remove phred bias from quals (#4415)

Assets 3

09 Feb 18:48

droazen

4.0.1.2

dbea0da

4.0.1.2

This is a small bug fix release to fix issues in the WDLs for Mutect2 and the CNV tools. It also includes a newer version of the GKL (Genomics Kernel Library) with some compression-related performance improvements.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Mutect2 WDL:
- Handle sample names with spaces correctly (#4360)
- Pass VCF indices correctly (#4381)
CNV somatic pair workflow and somatic panel workflow WDLs:
- Fixed mem_gb_for_model_segments parameter and exposed additional memory parameters (#4364)
Update to GKL version 0.8.3 with compression-related performance improvements (#4311)

Assets 3

01 Feb 21:35

droazen

4.0.1.1

390274b

4.0.1.1

This is a small bug fix release that fixes the following:

Fix sorting bug in GatherTranches. Gathered tranches should now be closer to target truth sensitivity in the lower range (~90%).
Mutect2 WDL: fix memory requests to request MB instead of GB.
CNV somatic pair workflow WDL: added missing Oncotator optional arguments
Prevent printing a stack trace when the user specifies the name of a tool that doesn't exist. Instead print suggestions for similar tool names.

Assets 3

30 Jan 18:31

droazen

4.0.1.0

d6e7635

4.0.1.0

Highlights of this release include a preview version of a future neural-network-based VQSR replacement, the ability to generate a VCF from the GermlineCNVCaller output, allele-specific annotation support in GenomicsDBImport, as well as a number of important post-4.0 bug fixes. See below for the full list of changes.

As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/

Changes in this release:

New experimental tool NeuralNetInference (#4097)
- An eventual VQSR replacement.
- Performs variant score inference with a 1D Convolutional Neural Network with a pre-trained model. This is faster but not as high quality the 2D model which is coming along with training and tranche-style filtering in the next GATK release (#4245).
- Tool name subject to change!
GenomicsDBImport:
- Add support for allele-specific annotations (#4261) (#3707)
- Allow sample names with whitespace in the sample name map file (#3982)
- Fix segfault crash on long path names (#4160)
- Allow multiple import commands to be run in the same workspace directory (#4106)
- Fix segfault crash during import when flag fields not declared in the VCF header (#3736)
- Improve warning message when PLs are dropped for records with too many alleles (#3745)
CNV tools:
- Added PostprocessGermlineCNVCalls tool for generating VCFs from GermlineCNVCaller output (#4254)
- Exposed bounds for determining copy-neutral region in CallCopyRatioSegments (#4263)
- Added support for CRAM inputs to CNV WDLs (#4257)
- Miscellaneous bug fixes, documentation updates, and WDL cleanup.
HaplotypeCaller
- Fix the --min-base-quality-score/-mbq argument, which previously had no effect (#4128). This fix also affects Mutect2.
- Fix a "contig must be non-null and not equal to *, and start must be >= 1" error by patching an edge case in the ReadClipper code: when reverting soft-clipped bases of a read at the start of a contig, don't explode if you end up with an empty read (#4203)
Mutect2:
- Smarter contamination model (#4195)
- Removed the --dbsnp and --comp arguments. The best practice now is to pass in gnomAD as the germline-resource.
- Removed a number of other arguments that were HaplotypeCaller-specific and not appropriate for Mutect2, such as --emit-ref-confidence.
- Mutect2 WDL: CRAM support (#4297)
- Mutect2 WDL: Compressed vcf output and Funcotator options (#4271)
- Miscellaneous WDL cleanup
HaplotypeCallerSpark:
- Fixes to the tool that make its output much closer to that of the non-Spark HaplotypeCaller (#4278). Note that this tool (unlike the non-Spark HaplotypeCaller) is still in beta, and should not be used for any real work. There are still major performance issues with the tool that in practice prevent running on certain kinds of large data and in certain modes.
- Disallow writing a .vcf.gz when in GVCF mode, as this combination currently doesn't work (#4277)
BwaSpark:
- set more reasonable default set of read filters (#4286)
PathSeq:
- Add WDL for running the PathSeq pipeline with a README and example JSON input. (#4143)
Fix piping between Picard tools run via the GATK by changing logging output to stderr (#4167)
Disallow unindexed block-compressed tribble files as input to walkers (#4240) (#4224). This works around a bug in HTSJDK that could cause such files to appear truncated. Until the HTSJDK bug is fixed, block-compressed .vcf.gz files (and similar files) will need to be accompanied by an index, which can be generated using the IndexFeatureFile tool.
Restore .list as an allowed extension for files containing multiple values for command-line arguments (#4270). The previous extension .args is also still allowed. This feature allows users to provide a file ending in .list or .args containing all of the values for an argument that accepts multiple values (for example: a list of BAM files), instead of typing all the values individually on the command line.
Fix conda environment creation to work better with the release distribution. (#4233)
IndexFeatureFile: more informative error message when trying to index a malformed file (#4187)
Suggest using BED files as a way to resolve ambiguous interval queries. (#4183)
Set Spark parameter userClassPathFirst = false #3933 (#3946)
Update to HTSJDK 2.14.1 (#4210)

Assets 3

09 Jan 19:03

droazen

4.0.0.0

1418c4a

4.0.0.0

4.0.0.0 general release

Assets 3

16 Oct 21:01

droazen

4.beta.6

fd8749d

4.beta.6 Pre-release

Pre-release

This release brings a critical bug fix to the GenomicsDBImport tool related to sample ordering, plus a new tool FixCallSetSampleOrdering to repair vcfs generated using the pre-4.beta.6 version of the tool. See the description of the bug in #3682 to determine whether you are affected. Do not run FixCallSetSampleOrdering unless you are sure that you are affected by the bug in #3682.

Other highlights include upgrading to the latest version of the Picard tools, and adding engine support for reading Gencode GTF files.

A docker image for this release can be found in the broadinstitute/gatk repository on dockerhub. Within the image, cd into /gatk then run gatk-launch commands as usual.

Note: Due to our current dependency on a snapshot of google-cloud-java, this release cannot be published to maven central.

Full list of changes for this release:

Fixed sample name reordering bug in GenomicsDBImport (#3667)
New tool FixCallSetSampleOrdering to repair vcfs affected by #3682 (#3675)
Integrate latest Picard tools via Picard jar. (#3620)
Adding in codec to read from Gencode GTF files. Fixes #3277 (#3410)
Upgrade to HTSJDK version 2.12.0 (#3634)
Upgrade to GKL version 0.7 (#3615)
Upgrade to GenomicsDB version 0.7.0 (#3575)
Upgrade Mockito from 1.10.19 -> 2.10.0. (#3581)
Add GVCF support to VariantsSparkSink (#3450)
Fix writing variants to GCS buckets (#3485)
Support unmapped reads in Spark. (#3369)
Correct gVCF header lines (#3472)
Dump more evidence info for SV pipeline debugging (#3691)
Add omitFromCommandLine=true for example tools (#3696)
Change gatkDoc and gatkTabComplete build tasks to include Picard. (#3683)
Adding data.table R package. (#3693)
Added a missing newline in ParamUtils method. (#3685)
Fix minor HTML issues in ReadFilter documentation (#3654)
Add CRAM integration tests for HaplotypeCaller. (#3681)
Fix SamAssertionUtils SortSam call. (#3665)
Add ExtremeReadsTest (#3070)
removing required FASTA reference input that was needed before (for its dict) for sorting variants in output VCF, now using header in input SAM/BAM (#3673)
re-enable snappy use in htsjdk (#3635)
fix 3612 (#3613)
pass read metadata to all code that needs to translate contig ids using read metadata (#3671)
quick fix for broken read (mapped to no ref bases) (#3662)
Fix log4j logging by removing extra copy from the classpath.#2622 (#3652)
add suggestion to regularly update gcloud to README (#3663)
Automatically distribute the BWA-MEM index image file to executors for BwaSpark (#3643)
Have PSFilter strip mate number from read names (#3640)
Added the tool PreprocessIntervals that bins the intervals given by the user to be used for coverage collection. (#3597)
Cpx SV PR serisers, part-4 (#3464)
fixed bug in which F1R2 and F2R1 annotation kept discarded alleles (#3636)
imprecise deletion calling (#3628)
Significant improvements to CalculateContamination (#3638)
Adds supplementary alignment info into fastq output, also additional… (#3630)
Adding tool to annotate with pair orientation info (#3614)
add elapsed time to assembly info in intervals file (#3629)
Created a VariantAnnotationArgumentCollection to reduce code duplication and added a StandardM2Annotation group (#3621)
Docs for turning assembled haplotypes into variant alleles (#3577)
Simplify spark_eval scripts and improve documentation. (#3580)
Renames StructuralVariantContext to SVContext. (#3617)
Added KernelSegmenter. (#3590)
Fix bug in for allele order independant comparison (#3616)
Docs for local assembly (#3363)
Added a method to VariantContextUtils which supports allele alt allele order independant comparison of variant contexts. (#3598)
Fixed incorrect logger in CollectAllelicCounts and RecalibrationReport. (#3606)
updating to newer htsjdk snapshot (#3588)
clear diffuse high frequency kmers (#3604)
update SmithWatermanAligner in preparation for native optimized aligner (#3600)
added spark tool for extracting original SAM records based on a file containning read names (#3589)
update README with correct path to install_R_packages.R #3601 (#3602)
HostAlignmentReadFilter and PSScorer use only identity scores and exp… (#3537)
Fixed alt-allele count in AllelicCountCollector and changed unspecified alleles in AllelicCount to N. (#3550)
Fix bad version check in manage_sv_pipeline.sh (#3595)
Use a handmade TestReferenceMultiSource in tests instead of a mock. (#3586)
Repackage ReadFilter plugin tests (#3525)
BamOut in M2 WDL and unsupported version with NIO for SpecOps Team (#3582)
Changed the path for posting the test reports
updates sv manager and cluster creation scripts to utilize dataproc cluster timed self-termination feature (#3579)
Implemented watershed algorithm for finding local minima in 1D data based on topological persistence. (#3515)
Reduce number of output partitions in PathSeqPipelineSpark (#3545)
add gathering of imprecise evidence links and extend evidence intervals to make links coherent in most cases (#3469)
Refactor PrimaryAlignmentReadFilter to PrimaryLineReadFilter (#3195)
Update ReadFilters documentation (#3128)
Changes in BwaMemIntegrationTest to avoid a 3-4 minutes runtime. (#3563)
Make error informative for non-diploid family likelihoods #3320 (#3329)
TableFeature javadoc and more tests (#3175)
Re-enable ancient BED test in IndexFeatureFile. (#3507)
add external evidence stream for CNVs (#3542)
clip M2 alleles before emitting in case some alleles were dropped (#3509)
Docs for M2 filtering (#3560)
Fix static test blocks and @BeforeSuite usages to prevent excessive code execution when tests aren't included in a suite. (#3551)
hide prototyping tools in sv package from help message (but still runnable if knowing their existence) (#3556)
Add support for running tools with omitFromCommandLine=true (#3486)
Adds utility methods to ReadUtils and CigarUtils. (#3531)
Cpx SV PR serisers, part-3 (#3457)

Assets 3

06 Sep 18:06

lbergelson

4.beta.5

0ef81ef

4.beta.5 Pre-release

Pre-release

Small release, includes highlights include an update to our BWA-MEM version, an experimental PythonScriptExecutor and an important bugfix for ValidateVariants -gvcf mode

Note: this still includes snapshot dependencies that prevent us from releasing to Maven central.

Complete change list:

Make directory name unique for BucketUtilsTest#testDirSizeGCS to avoid unwanted test interaction. (#3547)
Simple PythonScriptExecutor. #3501 (#3536)
Fix BucketUtils#dirSize on GCS. #3437 (#3539)
code duplication in read pos rank sum and its allele-specific version #1882 (#2657)
validatevariants -gvcf fix (#3530)
Added GetSampleName as stopgap until we have named parameters (#3538)
Pair HMM docs (#3433)
Fix MissingReferenceDictFile exception constructor. #3492 #2922 (#3524)
Extend ReadsPipelineSpark to run HaplotypeCallerSpark (#3452)
Updates bwamem-jni depedency to 1.0.2 and adds the possibility of aligning singletons to BwaEngine classes. (#3474)
Structural Variant Context (#3476)

Assets 3

26 Aug 02:53

droazen

4.beta.4

38707bf

4.beta.4 Pre-release

Pre-release

Highlights of this release include fixes to the GATK4 HaplotypeCaller to bring it closer to the output of the GATK3 HaplotypeCaller (although many of these fixes still need to be applied to HaplotypeCallerSpark), fixes for longstanding indexing and CRAM-related bugs in htsjdk, bash tab completion support for GATK commands, and many improvements to Mutect2 and the SV tools.

A docker image for this release can be found in the broadinstitute/gatk repository on dockerhub. Within the image, cd into /gatk then run gatk-launch commands as usual.

Note: Due to our current dependency on a snapshot of google-cloud-java, this release cannot be published to maven central.

Changes in this release:

HaplotypeCaller: a number of important updates and fixes to bring it closer to GATK 3.x's output (most of these fixes apply only to HaplotypeCaller, not HaplotypeCallerSpark) (#3519)
- reduce memory usage of the AssemblyRegion traversal by an order of magnitude
- create empty pileup objects for uncovered loci internally (fixes occasional gaps between GVCF blocks as well as some calling artifacts)
- when determining active regions, only consider loci within the user's intervals
- port some additional changes to the GATK 3.x HaplotypeCaller to GATK4
- fix bug with handling of the MQ annotation
Added bash tab completion support for GATK commands (#3424)
Updated to Intel GKL 0.5.8, which fixes bug in AVX detection, which was behaving incorrectly on some AMD systems (#3513)
Upgrade htsjdk to 2.11.0-4-g958dc6e-SNAPSHOT to pick up an important VCF header performance fix. (#3504)
Updated google-cloud-nio dependency to 0.20.4-alpha-20170727.190814-1:shaded (#3373)
Fix tabix indexing bugs in htsjdk, and reenable the IndexFeatureFile tool (#3425)
Fix longstanding issue with CRAM MD5 slice calculation in htsjdk (#3430)
Started publishing nightly builds
Performance improvements to allow MD+BQSR+HC Spark pipeline to scale to a full genome (#3106)
Eliminate expensive toString() call in GenotypeGVCFs (#3478)
ValidateVariants gvcf memory optimization (#3445)
Simplified Mutect2 annotations (#3351)
Fix MuTect2 INFO field types in the VCF header (#3422)
SV tools: fixed possibility of a negative fragment length that shouldn't have happened (#3463)
Added command line argument for IntervalMerging based on GATK3 (#3254)
Added 'nio_max_retries' option as a command line accessible option for GATK tools (#3328)
Fix aligned PathSeq input getting filtered by WellformedReadFilter (#3453)
Patch the ReferenceBases annotation to handle the case where no reference is present (#3299)
Honor index/MD5 creation for HaplotypeCaller/Mutect2 bamouts. (#3374)
Fix SV pipeline default init script handling (#3467)
SV tools: improve the test bam (#3455)
SV tools: improved filtering for smallish indels (#3376)
Extends BwaMemImageSingleton into a cache, BwaMemImageCache, that can… (#3359)
Try installing R packages from multiple CRAN repos in case some are down (#3451)
Run Oncotator (optional) in the CNV case WDL. (#3408)
Add option to run Spark tests only (#3377)
Added a .dockerignore file (#3418)
Code cleanup in the sv discovery package (#3361) and fixes #3224
Implement PathSeq taxon hit scoring in Spark (#3406)
Add option to skip pre-Bwa repartitioning in PSFilter (#3405)
Update the GQ after PLs get subset (#3409)
Removed the explicit System.exit(0) from Main (#3400)
build_docker.sh can run tests again #3191 #3160 (#3323)
Minor doc fixes #3173 (#3332)
Use ReadClipper in BaseQualityClipReadTransformer (#3388)
PathSeq adapter trimming and simple repeat masking (#3354)
Add scripts to manage SV spark jobs and copy result (#3370)
Output empty VQSLOD tranches in scatterTranches mode if no variant has VQSLOD high enough for requested threshold (#3397)
Option to filter short pathogen reference contigs (#3355)
Rewrote hapmap autoval wdl (#3379)
fixed contamination calculation, added error bars to output (#3385)
wrote wdl for Mutect panel of normals (#3386)
Turn off tranches plots if no output Rscript is specified (for annotation plots) (#3383)
Mutect2 wdls output the contamination (#3375)
Increased maximum copy-ratio variance slice-sampling bound. (#3378)
Replace --allowMissingData with --errorIfMissingData (gives opposite default behavior as previously) and print NA for null object in VariantsToTable (#3190)
docs for proposed tumor-in-normal tool (#3264)
Fixed the git version for the output jar on docker automatic builds (#3496)
Use correct logger class in MathUtils (#3479)
Make ShardBoundaryShard implement Serializable (#3245)

Assets 3

Releases: broadinstitute/gatk

4.0.3.0

Uh oh!

4.0.2.1

Uh oh!

4.0.2.0

Uh oh!

4.0.1.2

Uh oh!

4.0.1.1

Uh oh!

4.0.1.0

Uh oh!

4.0.0.0

Uh oh!

4.beta.6

Uh oh!

4.beta.5

Uh oh!

4.beta.4

Uh oh!