Question about somatic step: handling multi-sample input, -s option, and duplicated barcodes

Hi, thank you for developing such a useful tool!

I have a question regarding the somatic step in the Monopogen pipeline when working with multiple samples.

In the preprocess and germline steps, it's clear that multiple samples can be processed together. However, I'm unsure about the best practice for the somatic step. Specifically:

The -s option requires a two-column file (cell barcode and read count). If some cell barcodes are duplicated across different samples, how should this be handled? Will Monopogen process the BAM by splitting based on this table? If I just want to include all cells and set -s 1, can I assign a constant read count to all cells in the file to bypass the filtering?

If Monopogen does not handle sample separation internally, would you recommend: Splitting the BAM and VCF files from previous steps by sample, and running the somatic step individually per sample? Or keeping a merged input but adding a sample-specific prefix to each cell barcode in the table to avoid collisions?

Thank you very much for your time and support!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about somatic step: handling multi-sample input, -s option, and duplicated barcodes #120

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about somatic step: handling multi-sample input, -s option, and duplicated barcodes #120

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions