4.0.5.2
Highlights of this release include major Funcotator performance improvements on hg19/b37 inputs, a newly rewritten Java version of FilterVariantTranches, HaplotypeCaller bamout improvements, and improved Python integration by eliminate timeouts.
As usual, a docker image for this release can be downloaded from https://hub.docker.com/r/broadinstitute/gatk/.
Funcotator Improvements
- Improve handling of hg19/B37 references (#4586).
- Fixed performance bug involving excessive cache misses when querying datasources, resulting in major
performance improvements when running on HG19/B37 data (performance increased by approx. 30x with v1.4.20180615 of
the standard Funcotator data sources) (#4586). - Automatically detect when B37 data run against hg19 data source and convert contig names to be hg19 compliant.
- Assumes all data sources for the hg19 reference are compliant with hg19 contig names. User-created data
sources will have to honor this. - Perform additional validation on input data to ensure a given reference FASTA has a sequence
dictionary that is a superset of the given VCF. This is a more stringent check than is automatically
performed by the GATK. Can be disabled with the--disable-sequence-dictionary-validationflag. - Released new version of datasources to go with this release (1.4.20180615), necessary because the data
sources needed to be made consistent with hg19 (before they were a mix of hg19 and b37 contig names). - Updated the minimum required data source version to be the latest release.
- Updated the
getDbSNP.shandcreateSqliteCosmicDb.shdata source scripts to preprocess those data sources
to have hg19-compliant contigs names. - Removed the
--allow-hg19-gencode-b37-contig-matchingflag. - Removed the
--allow-hg19-gencode-b37-contig-matching-overrideflag.
- Fixed performance bug involving excessive cache misses when querying datasources, resulting in major
- User defined transcripts were being used as a filter rather than a priority order. The filtering step has been eliminated. Fixes #4918 (#4931)
- Added custom MAF fields to MafOutputRenderer (#4917)
- LocatableXsv data sources now produce at most 1 funcotation per allele pair. (#4936)
- LocatableXsv data sources now provide the correct number of funcotations (#4915)
- Preserve VCF fields in MAF output (#4872)
- Fixing error when spanning deletions overlap coding regions (#4881)
HaplotypeCaller/Mutect2
- Improvements to FilterMutectCalls. Eliminates about 3% of all false positives in DREAM while reducing sensitivity by about 0.1%
- Fix many questionable -bamout alignments where, because of a bad choice of Smith-Waterman parameters,
deletions were preferred over single-base substitutions.(#4858)
Result is many fewer spurious indels in the -bamout output. - Introduced new SmithWaterman parameters affecting realignment of the reads to their best haplotype. This
also changes some annotations that depend on the alignment, such asBaseQualityRankSumandReadPositionRankSum.
The changes are slight and make things more correct. - Modify the behavior of (BaseGraph) getNextReferenceVertex for non-ref paths (#4889)
FilterVariantTranches
- Rewrite VCF Tranche filtering in java, with tests (#4800)
Engine
- StreamingPythonExecutor no longer uses timeouts or relies on prompt synchronization. (#4757)
- Allow concordance tools (AbstractConcordanceWalker) to use NIO for truth call set (#4905)
- Add pre- and post- apply variant transformer to VariantWalkerBase
MarkDuplicatesSpark
- Fixed a missing special case in MarkDuplicates ReadsKey code to better match current picard results (#4899)
- Reworked the keys for MarkDuplicatesSpark to be sufficient for grouping on their own. (4878)
- Improve error message for MarkDuplicates duplicates readnames issues (#4879)
Structural Variants
- Add tests for AssemblyContigWithFineTunedAlignments (#4961)
- Fix no index output for assembly bam file (#4945)
- Overhaul tests on assembly-based non-complex breakpoint and type inference code (#4835)
- Simple fix to remove trailing slash in GCS_SAVE_PATH to avoid double slashes in GCS_RESULTS_DIR (#4873)