Description
Hello!
I am attempting to compare dna adenine methylation across two datasets at the same loci from
r10.4 data we have generated in two experimental conditions. We expect the methylation levels to differ substantially between the two datasets, but we want to determine a decently accurate quantitative estimate of the difference. In our analysis we apply an automatic threshold to determine "true" methylation calls.
I have been told that determining the optimal threshold is not trivial and is highly sensitive to sequencing run quality. I have been recommended to use modkit's auto threshold function. However I am worried that this thresholding may be sensitive to the total signal in the dataset and I would be worried that it would introduce distortions in comparisons across the dataset. I guess we are wondering if it would be more appropriate to threshold data using a fixed threshold or a data-informed threshold (and specifically modkits function) especially if we expect big differences between datasets?