Skip to content

Question regarding the calculation of effective genome size for GRCh38 #1426

@jqming123

Description

@jqming123

Hi deepTools team,

I have recently been using MACS3 for peak calling analysis, which requires the effective genome size parameter. The official MACS3 manual mentions that their default values are taken from your documentation page: https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html.

The effective genome size values calculated using faCount on your page are consistent with those cited in the MACS3 manual. However, when I tried to calculate the effective genome size for GRCh38 using faCount myself, I noticed a discrepancy. The value provided in the deepTools manual is 2,913,022,398, but my calculated value (total number of bases minus the number of bases marked as 'N') is 3291585349 - 161334594 = 3,130,250,755.

$ faCount -summary Homo_sapiens.GRCh38.dna_sm.toplevel.fa
#seq    len     A       C       G       T       N       cpg
total   3291585349      921201511       641188132       643863766       923997346       161334594       32009347
prcnt   1.0     0.2799  0.1948  0.1956  0.2807  0.0490  0.0097

Initially, I thought I might have chosen the wrong file or misunderstood the calculation method. However, when I applied the exact same method to the same type of fasta file (dna_sm.toplevel.fa) for GRCm39, my calculated effective genome size perfectly matched the value in the deepTools documentation (2728222451 - 73600668 = 2,654,621,783) :

$ faCount -summary Mus_musculus.GRCm39.dna_sm.toplevel.fa
#seq    len     A       C       G       T       N       cpg
total   2728222451      773810649       553008665       553055957       774746512       73600668        21922699
prcnt   1.0     0.2836  0.2027  0.2027  0.2840  0.0270  0.0080

Therefore, I am a bit confused about how the effective genome size for GRCh38 in the documentation was actually calculated. Could someone please help clarify this for me?

Thank you very much for your time and help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions