-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Hi,
I noticed that there are various inconsistencies in computing the costs for the linear and quadratic term of fgw.
Metric for the linear cost is different from both the signature and the pre-defined for quadratic
The inconsistency lies here:
Line 113 in 6b896b0
M = ot.dist(A_X,B_X) |
while the signature report "euclidean"
the default is "sqeuclidean"
https://pythonot.github.io/all.html#ot.dist
For the quadratic term instead, the metric is enforced to be "euclidean"
. This is problematic since even in the case of equal variance for both feature spaces (used to compute the linear and quadratic costs) the cost matrix would have different magnitudes (since afaik there is no scaling). Beside resolving the inconsistencies, exposing the metric choice in the signature would be helpful.
kl divergence assumes positivity of the feature space but does not assert it nor transform
The problem lies here:
Lines 115 to 117 in 6b896b0
s_A = A_X + 0.01 | |
s_B = B_X + 0.01 | |
M = kl_divergence_backend(s_A, s_B) |
in the case where a "standardized" or "scaled" gene table the resulting transport matrix is wrong, the result is invalid. to be fair, ot
complains as well yet in a cryptic way
UserWarning: Problem unbounded
result_code_string = check_result(result_code)
"scaled" gene tables are not uncommon and are the default outputs of normalization pipelines that use:
- when
sc.pp.scale
is used - when sctransform is used
- when person residuals are used