Hello Fan,
Continuing our effort to incorporate VBID2 into our pipeline for contamination estimation (see #43), we performed some experiments to evaluate its performance, given the fact that the tool was developed with short reads in mind.
Let me state the summary first, and post the experiment in subsequence posts.
Summary
In summary, it seems that VBID2 is underestimating contamination levels when
- the contamination is intra-family, and
- data type is CCS long reads,
- and this may be caused by coverage estimation (or read picking) issues.
I am not sure how much time you have for addressing this issue.
But if you do, I can do more experiments as needed.
Thank you!
Steve