-
Notifications
You must be signed in to change notification settings - Fork 220
Description
Hi deepTools team,
I have recently been using MACS3 for peak calling analysis, which requires the effective genome size parameter. The official MACS3 manual mentions that their default values are taken from your documentation page: https://deeptools.readthedocs.io/en/develop/content/feature/effectiveGenomeSize.html.
The effective genome size values calculated using faCount on your page are consistent with those cited in the MACS3 manual. However, when I tried to calculate the effective genome size for GRCh38 using faCount myself, I noticed a discrepancy. The value provided in the deepTools manual is 2,913,022,398, but my calculated value (total number of bases minus the number of bases marked as 'N') is 3291585349 - 161334594 = 3,130,250,755.
$ faCount -summary Homo_sapiens.GRCh38.dna_sm.toplevel.fa
#seq len A C G T N cpg
total 3291585349 921201511 641188132 643863766 923997346 161334594 32009347
prcnt 1.0 0.2799 0.1948 0.1956 0.2807 0.0490 0.0097
Initially, I thought I might have chosen the wrong file or misunderstood the calculation method. However, when I applied the exact same method to the same type of fasta file (dna_sm.toplevel.fa) for GRCm39, my calculated effective genome size perfectly matched the value in the deepTools documentation (2728222451 - 73600668 = 2,654,621,783) :
$ faCount -summary Mus_musculus.GRCm39.dna_sm.toplevel.fa
#seq len A C G T N cpg
total 2728222451 773810649 553008665 553055957 774746512 73600668 21922699
prcnt 1.0 0.2836 0.2027 0.2027 0.2840 0.0270 0.0080
Therefore, I am a bit confused about how the effective genome size for GRCh38 in the documentation was actually calculated. Could someone please help clarify this for me?
Thank you very much for your time and help!