You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have some doubts about the best order of functions in Seurat when integrating a large number of samples. I’ve been exploring the new features like BPcells and Sketch, but given the scale of my dataset (~800 samples), I’m looking for a more efficient integration strategy.
Since sample identity is my main source of variation, I was thinking of the following approach:
Split by sample, normalize the data to remove sample-specific variance.
Join layers after normalization.
Split by study (~30 studies) and proceed with full integration. After this, we will continue with FindVariableFeatures and won't re-normalized the data.
The idea behind this is to reduce the number of layers for integration, correct for sequencing depth differences at the sample level, and better handle samples with low cell counts (e.g., ~120 cells) without having to adjust parameters like k.weight drastically. However, will this affect the integrated object? I'm not sure if NormalizeData is influenced when layers are split by dataset or sample. If it isn't, why does Seurat calculate NormalizeData on all counts together.
One potential issue I see is that underrepresented cell types may not integrate properly. Could this be mitigated by using a Sketch per sample before integration?
Does this approach make sense from a statistical and technical perspective? Are there any potential issues I should be aware of?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi everyone,
I have some doubts about the best order of functions in Seurat when integrating a large number of samples. I’ve been exploring the new features like BPcells and Sketch, but given the scale of my dataset (~800 samples), I’m looking for a more efficient integration strategy.
Since sample identity is my main source of variation, I was thinking of the following approach:
The idea behind this is to reduce the number of layers for integration, correct for sequencing depth differences at the sample level, and better handle samples with low cell counts (e.g., ~120 cells) without having to adjust parameters like k.weight drastically. However, will this affect the integrated object? I'm not sure if NormalizeData is influenced when layers are split by dataset or sample. If it isn't, why does Seurat calculate NormalizeData on all counts together.
One potential issue I see is that underrepresented cell types may not integrate properly. Could this be mitigated by using a Sketch per sample before integration?
Does this approach make sense from a statistical and technical perspective? Are there any potential issues I should be aware of?
Looking forward to hearing your thoughts!
Thanks in advance,
Pep
Beta Was this translation helpful? Give feedback.
All reactions