Skip to content

Question about somatic step: handling multi-sample input, -s option, and duplicated barcodes #120

@ttt-404

Description

@ttt-404

Hi, thank you for developing such a useful tool!

I have a question regarding the somatic step in the Monopogen pipeline when working with multiple samples.

In the preprocess and germline steps, it's clear that multiple samples can be processed together. However, I'm unsure about the best practice for the somatic step. Specifically:

The -s option requires a two-column file (cell barcode and read count). If some cell barcodes are duplicated across different samples, how should this be handled? Will Monopogen process the BAM by splitting based on this table? If I just want to include all cells and set -s 1, can I assign a constant read count to all cells in the file to bypass the filtering?

If Monopogen does not handle sample separation internally, would you recommend: Splitting the BAM and VCF files from previous steps by sample, and running the somatic step individually per sample? Or keeping a merged input but adding a sample-specific prefix to each cell barcode in the table to avoid collisions?

Thank you very much for your time and support!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions