-
Notifications
You must be signed in to change notification settings - Fork 4
first commit with new neighborhood_connectivity function into _shap.py #66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
… into nhood_connectivity
for more information, see https://pre-commit.ci
Thank you for the pull request @LukasHats! After that, the user should use the In summary, what need to be done is:
I hope everything is clear! If you have any doubts, feel free to let me know :) |
Perfect, I will do it this week @marcovarrone . Thanks a lot for rewriting the whole shape metric part for it! Also now I understand how the shape metric used to work! Really like the new approach! |
@marcovarrone I am having trouble of integrating the nhood_connectivity score in the same way as the other metrics work. Lets take the example of
So for each connected component you have a quantification of the metric. However, the neighborhood_connectivity idea is different from that. It's rather a measure of how many cells in an image are located inside a connected component or not. Or further extended: how many cells from a neighborhood are inside such a connected component or not, per image. So its rather a meta_score, not something for each component. So I can not really deliver a metric per component, as its done for purity etc. The
Only if 2476, 2331, 1366 are in the same image of course. But I don't think this will give the results/score that I want toa achieve with the connectivity score. Otherwise we would maybe need to create a new |
That's a very good point @LukasHats, I didn't realize that! I see two possible solutions:
What do you think? |
Thanks for your suggestions @marcovarrone, I will play around with both ideas in my dataset to see what the output might look like. For the purity idea, one would have to consider a biologically meaningful combination of both metrics. |
To keep you posted @marcovarrone, I realized that the approach I described above suffers from a general problem. If we use a specific threshold of Thats why I thought of a new approach together with a teammember (https://github.com/gesavoigt/):
Means we have 105 unique connected components in this specific image and neighborhood. Now we can get the total number of cells of that neighborhood by:
So theoretically on average, we have 4 cells per connected components. Which is of course wrong as there might be a lot of cells that are not connected at all. However, we can use the information in |
Okay so @marcovarrone, I came up with an approach now. Lets go through an example: First, we run classic cellcharter (still old API):
My idea is now to add an absolute cell_count for each component (how many cells make up that component). As you said, we need 1 metric per component that is stored in the dictionary: so this could be for example the function
We now get exactly a number per component:
Now for plotting, we can come up with our own idea. We can first construct a dataframe, that holds all information we want for plotting/calculation. For example:
Now here we finally have a lot of information. We see for example that component 2244 has 9279 cells from the neighborhood called focal_pc_oxphos, and we have a total of 9317 cells of that neighborhood in that image. We can also see that there is only 1 unique connected component for this neighborhood in that image. We now only need to find a good way of plotting this in a relative manner so we also address smaller neighborhoods (most likely we need to take into account the total_neighborhood_cells_image) and maybe also integrate the information about unique connected components. |
Hi @LukasHats, looks great! We have to answer some questions about what the metric should capture before designing it:
|
@marcovarrone Proposed Metric: Normalized Component Contribution (NCC)
So its a metric per component, I will now write the function to store it in the dictionary just like the purity function etc. The only difference is, that the user also needs to provide the cellcharter neighborhood label key! I am excited to implement it with you! Edit(25.03.2025) |
@marcovarrone I now added the function, it should work, tried it out in a jupyter notebook on my anndata. I will implement the plotting function later today/tomorrow! |
…ould already work as its already implemented in the .tl module similar to the existing metrics
for more information, see https://pre-commit.ci
Hi @LukasHats, thank you very much for the contribution! I am quite busy at the moment, but I should be able to review it in 1-2 weeks! |
As we discussed here @marcovarrone : So I did not yet fully understand how the functions are built in the package, but I thought it might make sense to put it into
_shape.py
.The function can only be run after using
gr.connected_components
and takes theadata.obs
components
and calculates per image how many cells from a neighborhood are inside a connected component. Thelibrary_key
(e.g. image_ID) is necessary as we have to calculate that per image. If users input a condition, it will plot the different conditions ashue
, but that's not strictly necessary. Also users can setshow=False
to get the dataframe (here we can discuss if we want to return the figure object rather than the df), however the standard plotting function currently also gives theax
object and plots the graph. But happy to adjust all of that.Its also currently set to violinplot, which makes sense if you have many images. But its probably a bit odd if users have only 1 or a few images
I don't know what else needs to be added so this will turn into a function like
cc.pl.neighborhood_connectivity
and if you want to implement a test for it.