Description
Hello. I was reviewing some of my recent RNA004 direct RNA seq data to see what modifications I could call from them. I have six replicates here, and I called m6A, m5C, and pseudouridine for all of them. Then I ran modkit to generate a bedgraph for each, and compared them in IGV. I also remove samples from my bedgraph that have less than 10 total reads, to remove noise.
So for m6A, it would be:
dorado basecaller sup,m5C,inosine_m6A,pseU ./pod5/ -r --min-qscore 8 --device cuda:0 --reference reference.fasta | samtools sort > sample.bam
modkit pileup sample.bam --reference reference.fasta -t 18 --region REGION --max-depth 1000000 --bedgraph ./ --filter-theshold 0.9 --motif DRACH 2
awk '$5>9' sample_a_DRACH2_positive.bedgraph > sample_m6A.bedgraph
and for pseudouridine, it would be:
modkit pileup sample.bam --reference reference.fasta -t 18 --region REGION --max-depth 1000000 --bedgraph ./ --filter-theshold 0.9 --motif T 0
awk '$5>9' sample_17802_T0_positive.bedgraph > sample_pseudouridine.bedgraph
This works well, but I noticed that on a couple of my DRACH motif sites, I see a pattern like this:
I've marked in black three DRACH sites at the top. This is followed by m6A calls in all 6 replicates, then m5C calls in all 6 replicates, and then pseudouridine calls in all 6 replicates.
In two of these three shown DRACH sites, the A, C, and U of the motif are being called with fairly high confidence as m6A, m5C, and pseudouridine, all in a row. This doesn't occur at every high-probability m6A DRACH site, but it occurs a few of them.
I guess what I'm wondering is if this is truly three modifications, right next to each other, or if the basecaller is detecting a current resistance difference in this area, and each of the individual modification-finding algorithms is believing that this modification is the one it is checking for, if that makes sense?
If it's the latter, is there something I can change in dorado or modkit to screen for this? I'd like to get as accurate a modification profile as possible.