This tool processes BAM files to identify reads that overlap with primer regions and soft clips those regions based on the direction of the read and the primer. It also generates a detailed report of all reads that were screened against primers.
uv is used for managing the tool dependencies.
See https://docs.astral.sh/uv/getting-started/installation/#installing-uv
usage: uv run primer_soft_clip.py [-h] -i INPUT_BAM -o OUTPUT_BAM -p PRIMERS -s
SCREENED
Soft clip primer regions in BAM file
options:
-h, --help show this help message and exit
-i INPUT_BAM, --input_bam INPUT_BAM
Input BAM file
-o OUTPUT_BAM, --output_bam OUTPUT_BAM
Output BAM file
-p PRIMERS, --primers PRIMERS
Primers BED file
-s SCREENED, --screened SCREENED
Output file for screened reads- A modified BAM file where reads overlapping with primer regions have those regions soft clipped. The output BAM file will be automatically sorted and indexed.
Examples:
- A tab-separated file containing information about all reads that were screened against primers, with columns:
- read_name: Name of the read
- chromosome: Chromosome/contig name
- start: Start position of the read
- end: End position of the read
- primer_name: Name of the primer that overlaps with the read
- primer_direction: Direction of the primer (+ or -)
- read_direction: Direction of the read (+ or -)
- action: Whether the read was "soft_clipped" or "skipped"
The primers BED file should have the following columns:
- Chromosome
- Start position (0-based)
- End position
- Primer name
- Score (optional)
- Strand (required, '+' for forward, '-' for reverse)
Example BED file content:
Chromosome 2725853 2725873 ahpC_F 20 +
Chromosome 2726437 2726457 ahpC_R 20 -
Chromosome 1460914 1460934 atpE_F 20 +
Chromosome 1461381 1461401 atpE_R 20 -
Chromosome 1046049 1046071 eis_F 21 +
Chromosome 2715455 2715477 eis_F 22 -
Chromosome 2714908 2714928 MDL_eis-R1 20 +
The tool comes with a test suite. To run the tests:
uv run test_primer_soft_clip.py -vvvvThanh Le Viet - Theiagen Genomics - 2025


