Effect of low counts on cellcharter niches #115
-
|
Dear cellcharter team, disclaimer: I'm still quite new to dataset training and python I was just starting off with analyzing my first spatial transcriptomics dataset and when I ran the package with the automated k predict, I obtained 3 niches. One niche seemed like it was correlated with tissue areas that have lower UMI counts, but of which I know that they are biologically unrelated/distinct. Manually increasing the number of k was able to separate those niches (even though the stability was much lower), but I am still unsure if one of the remaining niches is maybe just a "low count" niche even though it might biologically belong to multiple other niches. I was wondering if the identification of spatial niches can be influenced by the local quality or low UMI counts, and if there is a way to control for this in the analysis? I know that in scRNA-seq analyses there are ways to regress out for the number of UMIs in a cell, but I am not sure if there's a way to translate this to spatial transcriptomics data and specifically the cellcharter pipeline. I'd be very curious to hear your opinion on it @marcovarrone! |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
|
Low-UMI regions (tissue edges, poor permeabilization) can end up creating artificial niches because the aggregated neighborhood features are uniformly low, and the GMM picks that up as its own cluster. If you're using scVI in the pipeline, the latent space already accounts for library size to some extent, but it's not always perfect for very low-count cells. Some things that might help: plotting Worth noting that |
Beta Was this translation helpful? Give feedback.
-
|
Could have not answered better @LiudengZhang ! Thank you! |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for your input, that helped a lot to better understand which approaches are more promising than others. @marcovarrone yes I was using scVI, I was quite closely following your tutorial on the CosMx dataset. @LiudengZhang thanks a lot for your suggestion of passing |
Beta Was this translation helpful? Give feedback.
Low-UMI regions (tissue edges, poor permeabilization) can end up creating artificial niches because the aggregated neighborhood features are uniformly low, and the GMM picks that up as its own cluster.
If you're using scVI in the pipeline, the latent space already accounts for library size to some extent, but it's not always perfect for very low-count cells. Some things that might help: plotting
total_countsspatially alongside niche labels to check if they co-localize (which would suggest it's technical rather than biological). Filtering low-quality cells more aggressively withsc.pp.filter_cells()could also help, or passingcontinuous_covariate_keys=['total_counts']toscvi.model.SCVI.…