sq.gr.co_occurrence make n_splits not affect the outputs

## Description



Hello, while working with co-occurence on large Xenium data I noticed that the n_splits parameter can lead to very different outputs than n_splits=1, as I saw in another comment on the function _"sq.gr.co_occurrence gives different results for n_splits=1 and n_splits>1 #755"_

The thing is that it makes the results hard to interpret, especially in this case of a large tissue.

By going in the code I figured a simple way to make the results always be the same as n_splits=1.

I just would want to know if there is a reason to compute it the way it is computed (calculating co-occurence on small splits then averaging the splits) and having different results for different splits ?


## Modifications

How it is done in the original function is that you compute the pairwise distance matrix for each split, then get the count matrix of how many cells are below the radius distance, compute the co-occurence, and after average all co-occurence matrices together.

The changes I made are :

1) make _occur_count() and _co_occurrence_helper() functions return the count matrices instead of the co-occurence matrices (I call count matrices the matrices that counts the number of each combination of cell types that are found below the radius threshold)

2) Sum the count matrix of every split inside co_occurrence() function (using a dtype that would allow large numbers). This count matrix should represent the real count matrix in the whole sample

3) Compute co-occurence on this count matrix, this should lead to results that are not averaged across splits and thus equivalent to the results where n_splits=1.

This method should not change the computation speed, as it computes basically the same things as the original, just in a different order. I would be glad to know if this can be a useful way of doing, or if I maybe did not understand that the original method was better to estimate the co-occurence.

Below is the file with my code if you want to take a look.

Thanks in advance :)

## File

[_ppatterns.py](https://github.com/user-attachments/files/22880086/_ppatterns.py)

## Version 

1.6.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sq.gr.co_occurrence make n_splits not affect the outputs #1048

Description

Modifications

File

Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sq.gr.co_occurrence make n_splits not affect the outputs #1048

Description

Description

Modifications

File

Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions