Bamsurgeon addsnv.py takes 4 hours to add one SNP to a bam file of size 3.1MB ?

Hi, 
I recently used Bamsurgeon to add an SNP at a specific location for a specific region of the bam file. I processed the bam file and extracted a smaller bam file that contains only a small region around the SNP that I'm interested in. However, upon running the script, bamsurgeon addsnp.py takes around 4 hours for one last step: "merging reads".
My question is: What does this merging step do, and is this step important for the final modified bam file? 

For clarification, here was the script that I used: 
singularity exec \
        --bind $extracted_reads_dir:/extracted_reads \
        --bind $ref_fasta_dir:/ref_Macaq \
        --bind $input_snp_dir:/input_snps \
        --bind $output_snp_dir:/output_snp \
        --bind $tmp_dir:/tmp \
        bamsurgeon.sif python3 \
        -O bamsurgeon/bin/addsnv.py \
        --procs 1 \
        -v /input_snps/$test_snv \
        -f /extracted_reads/$filename \
        -r /ref_Macaq/modif_ref-relabeled.fna \
        --vcf /tmp/${sample}_${chromosome}_${start_pos}-${end_pos}_modif.vcf \
        -o /output_snp/${sample}_${chromosome}_${start_pos}-${end_pos}_modif.bam \
        --tmpdir /tmp \
        --aligner mem \
        --seed 1234 \
        --mindepth 3 \
        --coverdiff 0.1 \
        --ignoreref \
        --picardjar /picard.jar 


The messages given by bamsurgeon:
16:25:16.194 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Sat May 11 16:25:16 EDT 2024] SamToFastq INPUT=/tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.bam FASTQ=/tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.1.fastq SECOND_END_FASTQ=/tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.2.fastq INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT    OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true INTERLEAVE=false INCLUDE_NON_PF_READS=false CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Sat May 11 16:25:16 EDT 2024] Executing as ****** on Linux 4.18.0-513.9.1.el8_9.x86_64 amd64; OpenJDK 64-Bit Server VM 11.0.22+7-post-Ubuntu-0ubuntu220.04.1; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.3
[Sat May 11 16:25:17 EDT 2024] picard.sam.SamToFastq done. Elapsed time: 0.03 minutes.
Runtime.totalMemory()=5821693952
INFO 2024-05-11 16:25:17,766 haplo_chr10_10571334_10571334 aligning /tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.1.fastq,/tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.2.fastq with mem
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 76 sequences (11476 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (0, 38, 0, 0)
[M::mem_pestat] skip orientation FF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (380, 434, 479)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (182, 677)
[M::mem_pestat] mean and std.dev: (432.43, 74.17)
[M::mem_pestat] low and high boundaries for proper pairs: (83, 776)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_process_seqs] Processed 76 reads in 0.039 CPU sec, 0.041 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 1 -M -Y /ref_Macaq/modif_ref-relabeled.fna /tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.1.fastq /tmp/haplo_chr10_10571334_10571334.tmpbam.49293d2d-1fcf-49cb-8767-f632f79c7457.2.fastq
[main] Real time: 5.075 sec; CPU: 3.588 sec
INFO 2024-05-11 16:25:23,580 haplo_chr10_10571334_10571334 avgincover: 38.000000, avgoutcover: 38.000000
**INFO 2024-05-11 16:25:23,753** done making mutations, merging mutations into /extracted_reads/39237_chr10_10566334-10576334_sampled.bam --> /output_snp/39237_chr10_10566334-10576334_modif.bam
**INFO 2024-05-11 20:30:28,614** loading donor reads into dictionary...
INFO 2024-05-11 20:30:28,652 secondary reads count:0
INFO 2024-05-11 20:30:28,652 supplementary reads count:0
INFO 2024-05-11 20:30:28,653 loaded 76 reads, (0 excluded, 0 null or secondary or supplementary--> ignored)

INFO 2024-05-11 20:30:28,746 replaced 76 reads (0 excluded )

INFO 2024-05-11 20:30:28,746 kept 0 secondary reads.

INFO 2024-05-11 20:30:28,747 kept 0 supplementary reads.

INFO 2024-05-11 20:30:28,747 ignored 0 non-primary reads in target BAM.

INFO 2024-05-11 20:30:29,559 vcf output written to /tmp/39237_chr10_10566334-10576334_modif.vcf39237_chr10_10566334-10576334_modif.addsnv.snp_chr10_10571334.vcf


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bamsurgeon addsnv.py takes 4 hours to add one SNP to a bam file of size 3.1MB ? #225

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bamsurgeon addsnv.py takes 4 hours to add one SNP to a bam file of size 3.1MB ? #225

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions