This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
covtobed is a C++ bioinformatics tool that generates BED coverage tracks from sorted BAM alignment files. It processes sequence alignment data to compute coverage depth and output regions in BED format, with support for physical coverage, strand-specific analysis, and various filtering options.
The project uses direct C++ compilation without makefiles:
# Basic compilation
c++ -std=c++11 *.cpp -I/path/to/bamtools/ -L${HOME}/path/to/lib/ -lbamtools -o covtobed
# Ubuntu with system packages
g++ -std=c++11 *.cpp -I/usr/include/bamtools /usr/lib/x86_64-linux-gnu/libbamtools.a -o covtobed -lz
# macOS with conda
c++ -std=c++11 *.cpp -I"${HOME}"/miniconda3/include/bamtools/ -L"${HOME}"/miniconda3/lib/ "${HOME}"/miniconda3/lib/libbamtools.a -o covtobed -lz- libbamtools: Required for BAM file handling
- zlib: For compression support
- C++11: Minimum standard required
binaries/build_ubuntu.sh: Automated Ubuntu build with static linkingbinaries/build_osx.sh: Automated macOS build using conda dependencies
Run the comprehensive test suite:
bash test/test.shThe test script:
- Uses the
covtobedbinary in the project root - Falls back to pre-compiled binaries if main binary doesn't exist
- Tests various functionality: sorted/unsorted BAM handling, coverage filtering, physical coverage, strand analysis, output formats
- Compares output against expected results in
test/output.bedandtest/mock.bed - Sets
COVTOBED_QUIET=1environment variable to suppress startup messages
test/demo.bam: Main test file for standard coverage analysistest/mp.bam: Mate-pair BAM for physical coverage testingtest/mock.bam: Synthetic BAM with known coverage valuestest/stranded.bam: Test file for strand-specific analysis- Additional test files for edge cases (duplicates, filtering, etc.)
- base.cpp: Main program logic, coverage calculation engine
- Uses BamTools library for BAM I/O
- Implements coverage tracking with priority queues
- Handles strand-specific and physical coverage modes
- OptionParser.cpp/h: Command-line argument parsing
- interval.h: Data structures for genomic intervals
- covtobed: Main executable (compiled binary)
Input: Handles BAM file reading and alignment filteringCovEnd: Manages alignment end positions in priority queueDepthType: Type definition for coverage depth (uint32_t)- Coverage calculation uses STL priority queues for efficient processing
Debug output can be enabled by changing #define debug if(false) to #define debug if(true) in base.cpp before compilation.
The project uses GitHub Actions (.github/workflows/c-cpp.yml):
- Builds on Ubuntu 18.04
- Installs libbamtools-dev system package
- Compiles with both g++ and clang++
- Runs basic functionality tests and full test suite
- Building: Use platform-appropriate build script or manual compilation command
- Testing: Always run
bash test/test.shbefore submitting changes - Debugging: Enable debug mode in base.cpp for detailed output
- Pull Requests: Follow GitHub Flow - fork, branch, PR against master
COVTOBED_QUIET=1: Suppresses startup message when reading from STDIN
- Sorted BAM Requirement: Tool requires sorted BAM input, will error on unsorted files
- Streaming Support: Can read from STDIN for pipeline integration
- Physical Coverage: Special mode for paired-end/mate-pair libraries
- Strand Analysis: Can output strand-specific coverage information
- Multiple Output Formats: BED (default) and counts formats supported
- Filtering Options: By mapping quality, coverage thresholds, alignment validity