Hi, thank you for developing such a useful tool!
I have a question regarding the somatic step in the Monopogen pipeline when working with multiple samples.
In the preprocess and germline steps, it's clear that multiple samples can be processed together. However, I'm unsure about the best practice for the somatic step. Specifically:
The -s option requires a two-column file (cell barcode and read count). If some cell barcodes are duplicated across different samples, how should this be handled? Will Monopogen process the BAM by splitting based on this table? If I just want to include all cells and set -s 1, can I assign a constant read count to all cells in the file to bypass the filtering?
If Monopogen does not handle sample separation internally, would you recommend: Splitting the BAM and VCF files from previous steps by sample, and running the somatic step individually per sample? Or keeping a merged input but adding a sample-specific prefix to each cell barcode in the table to avoid collisions?
Thank you very much for your time and support!