Skip to content

inconsistencies in the computation of cost matrices #28

@giovp

Description

@giovp

Hi,

I noticed that there are various inconsistencies in computing the costs for the linear and quadratic term of fgw.

Metric for the linear cost is different from both the signature and the pre-defined for quadratic

The inconsistency lies here:

M = ot.dist(A_X,B_X)

while the signature report "euclidean" the default is "sqeuclidean" https://pythonot.github.io/all.html#ot.dist
For the quadratic term instead, the metric is enforced to be "euclidean". This is problematic since even in the case of equal variance for both feature spaces (used to compute the linear and quadratic costs) the cost matrix would have different magnitudes (since afaik there is no scaling). Beside resolving the inconsistencies, exposing the metric choice in the signature would be helpful.

kl divergence assumes positivity of the feature space but does not assert it nor transform

The problem lies here:

paste/src/paste/PASTE.py

Lines 115 to 117 in 6b896b0

s_A = A_X + 0.01
s_B = B_X + 0.01
M = kl_divergence_backend(s_A, s_B)

in the case where a "standardized" or "scaled" gene table the resulting transport matrix is wrong, the result is invalid. to be fair, ot complains as well yet in a cryptic way

UserWarning: Problem unbounded
  result_code_string = check_result(result_code)

"scaled" gene tables are not uncommon and are the default outputs of normalization pipelines that use:

  • when sc.pp.scale is used
  • when sctransform is used
  • when person residuals are used

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions