Skip to content

Conversation

@timothymillar
Copy link
Contributor

Best linear unbiased estimate of expected heterozygosity following eq. 8 in Harris and DeGiorgio (2017).

Heterozygosity_blue is a variation of heterozygosity_expected which minimizes bias due to related individuals based on a kniship matrix.
This implementation makes the kinship matrix an optional argument.

If kinship is not specified then "regular" heterozygosity_expected is calculated using the (n / (n-1)) correction
described in #145 (resolving that issue).
This is a sensible 'default' because heterozygosity_blue is equivalent to heterozygosity_expected when the kinship matrix indicates non-related, non-inbred individuals (a matrix of zeros with a diagonal of 1/ploidy).

The implementation is a little more complex than ideal because the inverse of the kinship matrix must be calculated for only the called samples at each variant.
I have tried to do this in a way that minimizes recalculation of the inverse matrix.
Additionally a nan value is returned if the (sub-)kinship matrix has no inverse.

It's worth noting that heterozygosity_blue can return values outside of the interval [0, 1] with some inputs.

Related to #287 there is an optional argument to specify mixed ploidy data

This function calculates expected heterozygosity with a correction
for relatedness of individuals based on a kinship matrix.
In this implementation the kinship matrix is an optional argument
an if it is not present then the 'typical' expected heterozygosity
is calculated which is equivilent to heterozygosity_blue with
unrelated individuals.
This function is also suitable for mixed ploidy data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant