Skip to content

nphase takes a long time at GATK step #26

@mdondrup

Description

@mdondrup

I am now running nPhase on my real data and noticed it takes a very long time to finish. My data is yeast ONT reads with ~120X coverage and Illumina reads with ~100X coverage.

It seems nPhase is stuck in the GATK HaplotypeCaller step of the short reads, because this is the only process that is running (at only ~100% CPU):

michaeld  40243  40242 98 Jan02 ?        9-21:53:35 java -Dsamjdk.use_async_io_read_samtools=false 
-Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 
-jar /Home/ii/michaeld/miniconda2/envs/polyploidPhasing/share/gatk4-4.3.0.0-0/gatk-package-4.3.0.0-local.jar HaplotypeCaller 
-R ../../reference_genome/GCF_000146045.2/GCF_000146045.2_R64_genomic.fna -ploidy 2 -I ./nphase_out/Kveik_sample_6/Mapped/shortReads/Kveik_sample_6.final.bam -O ./nphase_out/Kveik_sample_6/VariantCalls/shortReads/Kveik_sample_6.vcf

This is the full command line I am using to run it:

nohup nice -2 nphase pipeline --sampleName Kveik_sample_6 --longReads sample6_cleaned_trimmed.fastq.gz 
--longReadPlatform  ont --R1 ../../Illumina_seq/Fastq/6-1_S8_R1_001.fastq.gz --R2 ../../Illumina_seq/Fastq/6-1_S8_R2_001.fastq.gz  \
--reference  ../../reference_genome/GCF_000146045.2/GCF_000146045.2_R64_genomic.fna --output ./nphase_out --threads 120 >nphase_nohup.out &

The machine has 144 threads and 2TB RAM, so it should be fine, also I remember that variant calling based on the short reads didn't take that long when I ran GATK directly. Possibly, one could set some performance options for java/gatk to inscrease the
speed?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions