Skip to content

sq.gr.co_occurrence make n_splits not affect the outputs #1048

@MathysHub

Description

@MathysHub

Description

Hello, while working with co-occurence on large Xenium data I noticed that the n_splits parameter can lead to very different outputs than n_splits=1, as I saw in another comment on the function "sq.gr.co_occurrence gives different results for n_splits=1 and n_splits>1 #755"

The thing is that it makes the results hard to interpret, especially in this case of a large tissue.

By going in the code I figured a simple way to make the results always be the same as n_splits=1.

I just would want to know if there is a reason to compute it the way it is computed (calculating co-occurence on small splits then averaging the splits) and having different results for different splits ?

Modifications

How it is done in the original function is that you compute the pairwise distance matrix for each split, then get the count matrix of how many cells are below the radius distance, compute the co-occurence, and after average all co-occurence matrices together.

The changes I made are :

  1. make _occur_count() and _co_occurrence_helper() functions return the count matrices instead of the co-occurence matrices (I call count matrices the matrices that counts the number of each combination of cell types that are found below the radius threshold)

  2. Sum the count matrix of every split inside co_occurrence() function (using a dtype that would allow large numbers). This count matrix should represent the real count matrix in the whole sample

  3. Compute co-occurence on this count matrix, this should lead to results that are not averaged across splits and thus equivalent to the results where n_splits=1.

This method should not change the computation speed, as it computes basically the same things as the original, just in a different order. I would be glad to know if this can be a useful way of doing, or if I maybe did not understand that the original method was better to estimate the co-occurence.

Below is the file with my code if you want to take a look.

Thanks in advance :)

File

_ppatterns.py

Version

1.6.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions