Review on using correlation as standard metric #253

XingerTang · 2026-04-23T15:54:17Z

XingerTang
Apr 23, 2026
Maintainer

Currently, we are using correlation as the standard metric to evaluate the accuracy of the output in all accuracy tests and some examples. But sometimes it may not provide the information that best describes the accuracy of the result, especially for discrete ones.

By far, the covariance is calculated via

np.corrcoef(real[:], output[:])[0, 1]

1. Not necessary to get a higher correlation value with a more accurate prediction

Case 1: With real = [0, 1, 1, 0], output = [0, 0, 1, 1], we have two loci correct (loci 1, 3), two loci wrong (loci 2, 4), the corresponding corr value is 0.
Case 2: With real = [0, 1, 1, 0], output = [0, 9, 1, 1], we have two loci correct (loci 1, 3), one locus wrong (locus 4), and one locus uncalled (locus 2), the corresponding corr value is 0.61958555.
Case 3: With real = [0, 1, 1, 0], output = [0, 1, 1, 1], we have three loci correct (loci 1, 2, 3), one locus wrong (locus 4), the corresponding corr value is 0.57735027.
Case 4: With real = [0, 1, 1, 0], output = [9, 1, 1, 1], we have two loci correct (loci 2, 3), one locus wrong (locus 4), and one locus uncalled (locus 1), the corresponding corr value is -0.57735027.
Case 5: With real = [0, 1, 1, 0], output = [0, 1, 1, 9], we have three loci correct (loci 1, 2, 3), one locus uncalled (locus 4), the corresponding corr value is -0.48189987.

I'm going to rank the 5 cases with a rule that

one correct corresponding to a score of +1,
one uncalled corresponding to a score of 0,
one wrong corresponding to a score of -1.

From highest to lowest, we have:

Case 5
Case 3
Case 2, Case 4
Case 1

But if we rank them by the corr value, we have:

Case 2
Case 3
Case 1
Case 5
Case 4

We can see that:

Case 5 is the highest if rank by score, but is the second to the lowest if rank by corr value
Case 2 and Case 4 have the same score, but one has the highest corr value and the other has the lowest corr value

This situation does not change with longer array.

Case 1: With real = [0, 1, 1, 0, 1, 1], output = [0, 1, 1, 9, 1, 1], we have five loci correct (loci 1, 2, 3, 5, 6), one locus uncalled (locus 4), the corresponding corr value is -0.53608771.
Case 2: With real = [0, 1, 1, 0, 1, 1], output = [0, 1, 1, 1, 1, 1], we have five loci correct (loci 1, 2, 3, 5, 6), one locus wrong (locus 4), the corresponding corr value is 0.63245553.
Case 3: With real = [0, 1, 1, 0, 1, 1], output = [0, 1, 1, 0, 1, 0], we have five loci correct (loci 1, 2, 3, 4, 5), one locus wrong (locus 6), the corresponding corr value is 0.70710678.

If we rank by the score we define by above, we have

Case 1
Case 2, Case 3

If we rank by the corr value,

Case 3
Case 2
Case 1

2. For very large datasets, the difference between correlation is not representative of the level of difference in the actual output

For one example dataset, we have two outputs that are evaluated in both absolute differences between output and true values, and the correlation value.

Output 1:

number of abs diff: 13059
average correlation: 0.8682583309448004

Output 2:

number of abs diff: 143421
average correlation: 0.8624665195036386

The difference in the average correlation is much smaller than the difference in the abs diff.

Conclusion

We may need to change our accuracy evaluation metric.

gregorgorjanc · 2026-04-23T16:22:40Z

gregorgorjanc
Apr 23, 2026
Maintainer

@XingerTang good that you are questioning the metrics! It depends on the downstream use cases, so it depends what metric is “best”.

I am not behind computer so can’t check, but are you keeping 9 in the vectors when calculating correlations? They should be omitted because they imply a missing value, not an actual value.

One typical downstream use case of imported genotypes is genomic prediction where we are aiming to maximise cor(est, true) where est is a vector of estimated genetic/breeding values and true a vector of true genetic/breeding values. Here we are aiming to maximise ranking of individuals so that follow-up decisions will be as correct as possible. Assuming that est = Ma and true = Qb with M a matrix of marker genotypes and their effects an and Q a matrix of QTL genotypes and their effects. We don’t know Q or b in real life, but the idea is that there is some correlation between M and Q, due to shared haplotype structure, giving cor(Ma, Qb)>0. We often have incomplete M, which is where M_imp comes in so we actually are looking at cor(M_imp a, Qb). Given this setting, which metric would you recommend?

There is a couple of publications that have looked into this:

1 reply

XingerTang Apr 23, 2026
Maintainer Author

Thank you for sharing the publications. They are going to be very useful.

Regarding whether the missing value is the main cause of the ineffectiveness of the correlation, it is indeed a major cause. But on the other hand, for the examples:

Case 2: With real = [0, 1, 1, 0, 1, 1], output = [0, 1, 1, 1, 1, 1], we have five loci correct (loci 1, 2, 3, 5, 6), one locus wrong (locus 4), the corresponding corr value is 0.63245553.
Case 3: With real = [0, 1, 1, 0, 1, 1], output = [0, 1, 1, 0, 1, 0], we have five loci correct (loci 1, 2, 3, 4, 5), one locus wrong (locus 6), the corresponding corr value is 0.70710678.

There is no missing value included. However, with the same number of incorrect loci, correlation is still showing a preference for one over the other.

For the vectors, I was usually using the squared norm of the differences as the cost function for optimisation. But it would be valuable to go through the publications first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review on using correlation as standard metric #253

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Review on using correlation as standard metric #253

Uh oh!

Uh oh!

XingerTang Apr 23, 2026 Maintainer

1. Not necessary to get a higher correlation value with a more accurate prediction

2. For very large datasets, the difference between correlation is not representative of the level of difference in the actual output

Conclusion

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

gregorgorjanc Apr 23, 2026 Maintainer

Uh oh!

XingerTang Apr 23, 2026 Maintainer Author

XingerTang
Apr 23, 2026
Maintainer

Replies: 1 comment 1 reply

gregorgorjanc
Apr 23, 2026
Maintainer

XingerTang Apr 23, 2026
Maintainer Author