-
Notifications
You must be signed in to change notification settings - Fork 98
Description
Description
Hello, while working with co-occurence on large Xenium data I noticed that the n_splits parameter can lead to very different outputs than n_splits=1, as I saw in another comment on the function "sq.gr.co_occurrence gives different results for n_splits=1 and n_splits>1 #755"
The thing is that it makes the results hard to interpret, especially in this case of a large tissue.
By going in the code I figured a simple way to make the results always be the same as n_splits=1.
I just would want to know if there is a reason to compute it the way it is computed (calculating co-occurence on small splits then averaging the splits) and having different results for different splits ?
Modifications
How it is done in the original function is that you compute the pairwise distance matrix for each split, then get the count matrix of how many cells are below the radius distance, compute the co-occurence, and after average all co-occurence matrices together.
The changes I made are :
-
make _occur_count() and _co_occurrence_helper() functions return the count matrices instead of the co-occurence matrices (I call count matrices the matrices that counts the number of each combination of cell types that are found below the radius threshold)
-
Sum the count matrix of every split inside co_occurrence() function (using a dtype that would allow large numbers). This count matrix should represent the real count matrix in the whole sample
-
Compute co-occurence on this count matrix, this should lead to results that are not averaged across splits and thus equivalent to the results where n_splits=1.
This method should not change the computation speed, as it computes basically the same things as the original, just in a different order. I would be glad to know if this can be a useful way of doing, or if I maybe did not understand that the original method was better to estimate the co-occurence.
Below is the file with my code if you want to take a look.
Thanks in advance :)
File
Version
1.6.3