Skip to content

Determining Ml score threshold for quantitative comparison of methylation levels across two datasets at the same loci #372

Open
@ngamarra

Description

@ngamarra

Hello!

I am attempting to compare dna adenine methylation across two datasets at the same loci from
r10.4 data we have generated in two experimental conditions. We expect the methylation levels to differ substantially between the two datasets, but we want to determine a decently accurate quantitative estimate of the difference. In our analysis we apply an automatic threshold to determine "true" methylation calls.

I have been told that determining the optimal threshold is not trivial and is highly sensitive to sequencing run quality. I have been recommended to use modkit's auto threshold function. However I am worried that this thresholding may be sensitive to the total signal in the dataset and I would be worried that it would introduce distortions in comparisons across the dataset. I guess we are wondering if it would be more appropriate to threshold data using a fixed threshold or a data-informed threshold (and specifically modkits function) especially if we expect big differences between datasets?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionLooking for clarification on inputs and/or outputs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions