Skip to content

Improved doublet detection in call_lineages#225

Open
colganwi wants to merge 6 commits intomasterfrom
doublet-detection
Open

Improved doublet detection in call_lineages#225
colganwi wants to merge 6 commits intomasterfrom
doublet-detection

Conversation

@colganwi
Copy link
Copy Markdown
Collaborator

@colganwi colganwi commented Sep 19, 2023

This PR makes a number of improvements to call_lineages step of the preprocessing pipeline. These changes are based on my experience processing a dataset with high ambient RNA and a significant proportion of doublets.

  1. Adds a min_umi_per_intbc parameter to filter the allele table, which is useful for removing ambient intBC molecules.

  2. Removes assumption in assign_lineage_groups that the size of lineage groups is strictly decreasing since this may not be true with high kinship_thresh.

  3. Changes the doublet detection algorithm to use the kinship scores calculated by score_lineage_kinships. I have found that these kinship scores are a more reliable way to detect doublets than the current filter_inter_doublets function since they take into account UMIs instead of just the binarized intBCs.

  4. Adds a keep_doublets parameter to allow the user to keep the doublets in the allele table which makes it much easier to tune the doublet_kinship_thresh parameter.

The API remains the same and the old doublet detection algorithm can still be run for now, but I've added a warning message that it will be depreciated in 2.1.0. What this PR does not address is the issue that doublets can silently slip through call_lineages since the doublet alleles are filtered out by the min_intbc_thresh making them look like singlets. It would be better if this failure mode was avoided but I'm not sure how to do it while still filtering.

@mattjones315 if you send me test data I can compare this algorithm to the old one. I think its an improvement for most cases but it would be good to test it. I'm also open to implementing a more complex doublet detection algorithm using a mixture model if needed. I'll add tests once we solidify the doublet detection algorithm.

@colganwi colganwi self-assigned this Sep 20, 2023
@colganwi colganwi added the enhancement New feature or request label Sep 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant