Description
Hello,
I am currently analyzing differential methylation using MODKIT and comparing results with call file-based analysis. My general strategy is as follows:
For each condition, I calculate:
Number of detected m6A modifications at a site/Number of transcripts for the corresponding gene
I then compute differential methylation by taking the ratio of Treatment/CTR.
To estimate transcript abundance, I used BAMBU, and I am comparing results based on both transcript counts and CPM normalization.
I have a few questions regarding the appropriateness of this approach:
-
Is it statistically appropriate to use CPM normalization for differential methylation analysis?
Since CPM is commonly used for RNA-seq normalization, I am wondering if applying CPM in the context of m6A modification quantification is valid. -
Handling missing modification calls between conditions
Due to sequencing depth limitations, some m6A sites may be detected in Treatment but not in CTR, or vice versa. When this happens, the denominator in my calculation becomes zero, leading to NA values.
In RNA-seq differential expression analysis, pseudo counts are often added to handle this issue.
However, in MODKIT, the output consists of discrete m6A modification counts rather than continuous expression levels.
What would be an appropriate way to introduce pseudo counts in this context?
- Should I discard transcripts with missing data?
If pseudo counts are not suitable, analyzing differential m6A methylation becomes challenging due to data loss.
Would it be bioinformatically appropriate to filter out transcripts that lack modification calls in one of the conditions and only analyze transcripts with detected m6A modifications in both conditions?
Or is there a better approach to retain more data without introducing bias?
I am still learning bioinformatics, so I would greatly appreciate any insights or guidance on these issues.
Thank you in advance for your help!