Detailed Description
As indicated in wiki: https://wiki.oicr.on.ca/display/icgcargotech/Data+Management+Tasks
The way the CRAM is generated can minimise compute overhead, see samtools, use seqs_per_slice=1000 instead of the default. Increase in file size is negligible, but has a large impact on random access.
Possible Implementation
Should add seqs_per_slice=1000 at
https://github.com/icgc-argo/dna-seq-processing-tools/blob/master/tools/bam-merge-sort-markdup/bam-merge-sort-markdup.py#L62
Detailed Description
As indicated in wiki: https://wiki.oicr.on.ca/display/icgcargotech/Data+Management+Tasks
The way the CRAM is generated can minimise compute overhead, see samtools, use seqs_per_slice=1000 instead of the default. Increase in file size is negligible, but has a large impact on random access.
Possible Implementation
Should add seqs_per_slice=1000 at
https://github.com/icgc-argo/dna-seq-processing-tools/blob/master/tools/bam-merge-sort-markdup/bam-merge-sort-markdup.py#L62