Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
177 changes: 177 additions & 0 deletions .github/workflows/cmake-test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
name: CMake Test Suite

on:
push:
branches: [ master, main ]
pull_request:
branches: [ master, main ]

jobs:
test:
runs-on: ubuntu-latest

strategy:
matrix:
compiler: [gcc, clang]

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
build-essential \
cmake \
pkg-config \
catch2 \
libbenchmark-dev \
libbamtools-dev \
libjsoncpp-dev \
zlib1g-dev \
samtools \
valgrind \
bc

- name: Set up compiler
run: |
if [ "${{ matrix.compiler }}" = "clang" ]; then
sudo apt-get install -y clang
echo "CC=clang" >> $GITHUB_ENV
echo "CXX=clang++" >> $GITHUB_ENV
else
echo "CC=gcc" >> $GITHUB_ENV
echo "CXX=g++" >> $GITHUB_ENV
fi

- name: Build covtobed
run: |
$CXX -std=c++17 *.cpp -I/usr/include/bamtools -lbamtools -o covtobed -lz

- name: Verify covtobed build
run: |
./covtobed --version

- name: Configure CMake test suite
working-directory: test
run: |
rm -rf build
mkdir -p build
cd build
cmake ..

- name: Build test suite
working-directory: test/build
run: |
make -j$(nproc)

- name: Generate synthetic test data
working-directory: test/build
run: |
make generate_test_data

- name: Run unit tests
working-directory: test/build
run: |
./unit/unit_tests --reporter=xml --out=unit_test_results.xml

- name: Run integration tests
run: |
cd test
bash integration/test_enhanced.sh

- name: Run benchmarks (quick)
working-directory: test/build
run: |
timeout 60s ./benchmark/benchmark_coverage --benchmark_min_time=0.1s || true

- name: Run original test suite (compatibility)
run: |
bash test/test.sh

- name: Upload test results
uses: actions/upload-artifact@v4
if: always()
with:
name: test-results-${{ matrix.compiler }}
path: |
test/build/unit_test_results.xml
test/data/synthetic/

- name: Memory check (gcc only)
if: matrix.compiler == 'gcc'
run: |
valgrind --tool=memcheck --leak-check=full --error-exitcode=1 \
./covtobed test/demo.bam > /dev/null

test-minimal:
name: Test minimal dependencies
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install minimal dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
build-essential \
libbamtools-dev \
libjsoncpp-dev \
zlib1g-dev

- name: Build covtobed (minimal)
run: |
g++ -std=c++17 *.cpp -I/usr/include/bamtools -lbamtools -o covtobed -lz

- name: Test basic functionality
run: |
./covtobed --version
./covtobed test/demo.bam > /dev/null

cross-platform:
name: Cross-platform compatibility
strategy:
matrix:
os: [ubuntu-22.04]

runs-on: ${{ matrix.os }}

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get install -y \
build-essential \
cmake \
pkg-config \
libbamtools-dev \
libjsoncpp-dev \
zlib1g-dev

# Install testing frameworks if available
sudo apt-get install -y catch2 libbenchmark-dev || true

- name: Build covtobed
run: |
g++ -std=c++17 *.cpp -I/usr/include/bamtools -lbamtools -o covtobed -lz

- name: Run basic tests
run: |
./covtobed --version
bash test/test.sh

- name: Build test suite (if dependencies available)
run: |
if pkg-config --exists catch2 2>/dev/null; then
cd test
rm -rf build
mkdir -p build
cd build
cmake .. && make -j$(nproc) || echo "Test suite build failed, continuing..."
fi
24 changes: 2 additions & 22 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,23 +1,3 @@
flag.*
build.sh
covtobed.wiki
a.out
*.bam.bai
mos*
.DS_Store
README.md~
exome
test_local
*.0.?
._*
*.sam
time.*
test2.bam
*.cpp~
*bak
pannelli
paper/paper.md~
TE
split_sam_by_flag.pl
mini_tests
.vscode
prova.sh
covtobed
102 changes: 102 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

`covtobed` is a C++ bioinformatics tool that generates BED coverage tracks from sorted BAM alignment files. It processes sequence alignment data to compute coverage depth and output regions in BED format, with support for physical coverage, strand-specific analysis, and various filtering options.

## Build System and Dependencies

### Manual Compilation
The project uses direct C++ compilation without makefiles:

```bash
# Basic compilation
c++ -std=c++11 *.cpp -I/path/to/bamtools/ -L${HOME}/path/to/lib/ -lbamtools -o covtobed

# Ubuntu with system packages
g++ -std=c++11 *.cpp -I/usr/include/bamtools /usr/lib/x86_64-linux-gnu/libbamtools.a -o covtobed -lz

# macOS with conda
c++ -std=c++11 *.cpp -I"${HOME}"/miniconda3/include/bamtools/ -L"${HOME}"/miniconda3/lib/ "${HOME}"/miniconda3/lib/libbamtools.a -o covtobed -lz
```

### Dependencies
- **libbamtools**: Required for BAM file handling
- **zlib**: For compression support
- **C++11**: Minimum standard required

### Platform-Specific Build Scripts
- `binaries/build_ubuntu.sh`: Automated Ubuntu build with static linking
- `binaries/build_osx.sh`: Automated macOS build using conda dependencies

## Testing

### Test Suite
Run the comprehensive test suite:
```bash
bash test/test.sh
```

The test script:
- Uses the `covtobed` binary in the project root
- Falls back to pre-compiled binaries if main binary doesn't exist
- Tests various functionality: sorted/unsorted BAM handling, coverage filtering, physical coverage, strand analysis, output formats
- Compares output against expected results in `test/output.bed` and `test/mock.bed`
- Sets `COVTOBED_QUIET=1` environment variable to suppress startup messages

### Test Data
- `test/demo.bam`: Main test file for standard coverage analysis
- `test/mp.bam`: Mate-pair BAM for physical coverage testing
- `test/mock.bam`: Synthetic BAM with known coverage values
- `test/stranded.bam`: Test file for strand-specific analysis
- Additional test files for edge cases (duplicates, filtering, etc.)

## Code Architecture

### Core Components
- **base.cpp**: Main program logic, coverage calculation engine
- Uses BamTools library for BAM I/O
- Implements coverage tracking with priority queues
- Handles strand-specific and physical coverage modes
- **OptionParser.cpp/h**: Command-line argument parsing
- **interval.h**: Data structures for genomic intervals
- **covtobed**: Main executable (compiled binary)

### Key Classes and Structures
- `Input`: Handles BAM file reading and alignment filtering
- `CovEnd`: Manages alignment end positions in priority queue
- `DepthType`: Type definition for coverage depth (uint32_t)
- Coverage calculation uses STL priority queues for efficient processing

### Debug Mode
Debug output can be enabled by changing `#define debug if(false)` to `#define debug if(true)` in base.cpp before compilation.

## CI/CD

The project uses GitHub Actions (`.github/workflows/c-cpp.yml`):
- Builds on Ubuntu 18.04
- Installs libbamtools-dev system package
- Compiles with both g++ and clang++
- Runs basic functionality tests and full test suite

## Development Workflow

1. **Building**: Use platform-appropriate build script or manual compilation command
2. **Testing**: Always run `bash test/test.sh` before submitting changes
3. **Debugging**: Enable debug mode in base.cpp for detailed output
4. **Pull Requests**: Follow GitHub Flow - fork, branch, PR against master

## Environment Variables

- `COVTOBED_QUIET=1`: Suppresses startup message when reading from STDIN

## Key Features to Understand

- **Sorted BAM Requirement**: Tool requires sorted BAM input, will error on unsorted files
- **Streaming Support**: Can read from STDIN for pipeline integration
- **Physical Coverage**: Special mode for paired-end/mate-pair libraries
- **Strand Analysis**: Can output strand-specific coverage information
- **Multiple Output Formats**: BED (default) and counts formats supported
- **Filtering Options**: By mapping quality, coverage thresholds, alignment validity
8 changes: 8 additions & 0 deletions COMPILING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Compiling

this project requires libbamtools in addition to the more commonly available zlib.

```bash
c++ -std=c++11 *.cpp -I/path/to/libbamtools/ -lbamtools \
-o covtobed -lz
```
22 changes: 19 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,16 +56,24 @@ Options:
skip reference sequences having size less or equal to MINCTG
-d, --discard-invalid-alignments
skip duplicates, failed QC, and non primary alignment,
minq>0 (or user-defined if higher) (default: 0)
minq>0 (or user-defined if higher) (default: enabled)
--keep-invalid-alignments
Keep duplicates, failed QC, and non primary alignment,
min=0 (or user-defined if higher) - reverts to legacy behavior
--output-strands output coverage and stats separately for each strand
--format=CHOICE output format
```
## Example

Command:
Command (with new default filtering):
```
covtobed -m 0 -x 5 test/demo.bam
```

To use legacy behavior (no filtering):
```
covtobed --keep-invalid-alignments -m 0 -x 5 test/demo.bam
```
Output:
```text
[...]
Expand Down Expand Up @@ -102,6 +110,14 @@ sudo docker run --rm -ti andreatelatin/covtobed coverage -h
singularity exec covtobed.simg coverage -h
```

## Important Changes in v1.4.0

**Default Behavior Change**: Starting with version 1.4.0, `covtobed` now **filters invalid alignments by default** (duplicates, failed QC, non-primary alignments). This provides higher quality results out of the box.

- **New default**: Invalid alignments are discarded (equivalent to using `--discard-invalid-alignments`)
- **Legacy behavior**: Use `--keep-invalid-alignments` to revert to the old behavior
- **Conflicting flags**: Using both `--discard-invalid-alignments` and `--keep-invalid-alignments` will result in an error

## Startup message

When invoked without arguments, covtobed will print a message to inform the user that it
Expand All @@ -118,7 +134,7 @@ This tool requires **libbamtools** and **zlib**.

To manually compile:
```
c++ -std=c++11 *.cpp -I/path/to/bamtools/ -L${HOME}/path/to/lib/ -lbamtools -o covtobed
c++ -std=c++17 *.cpp -I/path/to/bamtools/ -L${HOME}/path/to/lib/ -lbamtools -o covtobed
```

## Issues, Limitations and how to contribute
Expand Down
Loading
Loading