Skip to content

modkit summary shows unexpected proportion of methylation - is my methylation information being lost #417

Open
@eesiribloom

Description

@eesiribloom

When I run modkit summary on my samples I am getting a strange batch effect.
I have two batches of samples:

1st batch re-basecalled with dorado standalone software v0.6.1
model [email protected]
and parameter --modified-bases 5mC_5hmC

2nd batch basecalled from dorado server v7.3.9
basecall config dna_r10.4.1_e8.2_400bps_5khz_modbases_5hmc_5mc_cg_hac_prom.cfg

so the model should be essentially the same between the two batches.

I converted uBAMs to fastq with samtools fastq -T "*"
I aligned to GRCh38 with minimap2 -t 32 -ax map-ont -y -Y --MD

When I run modkit summary on samples from the 1st batch I get an example output:

# bases             C 
# total_reads_used  10042 
# count_reads_C     10042 
# pass_threshold_C  0.7636719 
 base  code  pass_count  pass_frac    all_count  all_frac 
 C     -     354650      0.3519645    372751     0.33318585 
 C     h     26080       0.025882516  55114      0.049264 
 C     m     626900      0.622153     690883     0.61755013 

When I run modkit summary on samples from the 2nd batch I get an example output:

# bases             C 
# total_reads_used  10042 
# count_reads_C     10042 
# pass_threshold_C  0.73828125 
 base  code  pass_count  pass_frac     all_count  all_frac 
 C     -     11116207    0.95703727    12023879   0.9322321 
 C     h     56736       0.0048846216  203618     0.015786855 
 C     m     442286      0.038078114   670449     0.05198107 

I can see that all the 2nd batch of samples have virtually no 5mC being detected at all whereas it is the majority in the 1st batch. I would not expect global methylation patterns to differ on these scale (these are cancer samples from the same cohort and cancer type). When I investigate the fastq (grep ^@) and bam files (samtools view BAM_FILE | less) for the 2nd batch of samples, there are MM and ML tags present.

How can I investigate this further or work out where the issue lies?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionLooking for clarification on inputs and/or outputs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions