Feature Request: giraffe mapping of CRAMs with multiple RGs

The present WDL workflow only supports only one RG per cram.  When the workflow creates paired end fastq files, the RG is not preserved and insert size estimates are based on reads from all RGs in the cram. This is problematic when each RG may have different insert sizes.  If the RG is added to the paired end fastq files, unfortunately the kmc step breaks and does not recognize the read pairs.  

If a cram has multiple RGs, the cram should initially be split into multiple bam files for each RG (https://www.htslib.org/doc/samtools-split.html). The paired-end fastq files collated from the bams could then preserve the RG.  The giraffe alignment will then be based on the insert size of each RG in the cram.  Each giraffe mapped RG bam can then be merged into one bam with each of the RGs in the header and each read properly tagged with the original RG. 

This is how the RG is presently specified when using giraffe in the wdl tasks.  Each RG is specified as "1" and the RGs in the original cram are lost. 

```
        vg giraffe \
          --progress \
          --read-group "ID:1 LB:lib1 SM:~{in_sample_name} PL:illumina PU:unit1" \
          --sample "~{in_sample_name}" \
          --output-format BAM \
          ~{in_giraffe_options} \
          --ref-paths ~{in_ref_dict} \
          -f ~{in_left_read_pair_chunk_file} -f ~{in_right_read_pair_chunk_file} \
          -x ~{in_xg_file} \
          -H ~{in_gbwt_file} \
          -g ~{in_ggbwt_file} \
          -d ~{in_dist_file} \
          -m ~{in_min_file} \
          -t ~{in_map_cores} > ~{in_sample_name}.${READ_CHUNK_ID}.bam
    >>>
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: giraffe mapping of CRAMs with multiple RGs #151

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: giraffe mapping of CRAMs with multiple RGs #151

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions