Skip to content

Strand count asymmetry in modkit motif compared to find-motifs #530

@angelluigi

Description

@angelluigi

Hi,

I’ve observed a possible inconsistency between modkit motif and modkit find-motifs regarding how strand orientation is handled.

When using find-motifs, the motif counts correctly reflect the total number of genomic sites (combining both forward and reverse-complement strands). However, when I use modkit motif with a known sequence, the total number of detected sites across both strands is approximately correct, but the distribution between the forward and reverse strands is uneven.

For example, if the motif occurs ~300 times on the forward strand and ~300 on the reverse complement in the reference, the output from modkit motif might report something like 120 sites on the “+” strand and 180 on the “−” strand, instead of the expected 300 + 300.

This pattern is reproducible across samples and references. The total number of motif hits is close to what I expect, but the strand-level split is systematically unbalanced.

Could you clarify whether modkit motif applies any strand filtering or alignment-based orientation rule when assigning strand labels to motif hits?

It might help to document this strand handling behavior or provide a flag to ensure symmetric motif detection and reporting across both strands.

Thanks again for your time and for developing such a powerful tool — Modkit has been extremely helpful for methylation motif analysis.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionLooking for clarification on inputs and/or outputs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions