bamslice

Extract specific byte ranges from BAM/CRAM files and convert to interleaved FASTQ format. Designed for parallel processing across compute nodes without requiring pre-indexing.

Features

No pre-indexing required - accepts approximate byte offsets
Auto-aligns to block boundaries - finds the next valid BGZF block at or after the start offset
Byte-range based - process arbitrary byte ranges for easy parallelization
No overlap - using contiguous byte ranges guarantees no duplicate reads
Interleaved FASTQ output - same format as samtools fastq
Parallel-ready - designed for distributed processing

Installation

cargo build --release

Binary: target/release/bamslice

Usage

bamslice \
  --input input.bam \
  --start-offset 0 \
  --end-offset 10000000 \
  --output output.fastq

Arguments

--input, -i: Input BAM
--start-offset, -s: Starting byte offset (will find next BGZF block at or after this offset)
--end-offset, -e: Ending byte offset (will stop when reaching a block at or after this offset)
--output, -o: Output FASTQ file (default: stdout)

Examples

Extract first half of file:

FILE_SIZE=$(stat -f%z input.bam)  # macOS
# FILE_SIZE=$(stat -c%s input.bam)  # Linux
HALF=$((FILE_SIZE / 2))

bamslice -i input.bam -s 0 -e $HALF -o first_half.fastq

Extract second half (no overlap!):

bamslice -i input.bam -s $HALF -e $FILE_SIZE -o second_half.fastq

Output to stdout:

bamslice -i input.bam -s 0 -e 1000000 | head -n 4

Parallel Processing

The tool uses byte ranges, making it trivial to parallelize without coordination

Nextflow Example

See example.nf for a pipeline that pipes bamslice output through fastp for QC/filtering.

nextflow run example.nf --bam input.bam --chunk_size 104857600

How It Works

BGZF Structure: BAM files use BGZF (Blocked GZIP) - a series of independent compressed blocks
Block Discovery: Given a start offset, scans forward to find the next valid BGZF block (magic: 0x1f 0x8b 0x08)
Range Processing: Processes all reads from blocks starting before end_offset
No Overlap: Each block is processed by exactly one job when using contiguous byte ranges
FASTQ Output: Converts BAM records to interleaved FASTQ format

Why Byte Ranges?

No indexing overhead: Don't need to scan the entire file first
Trivial parallelization: Just choose your start/end offsets (see example nextflow)
No coordination: Each process works independently
Guaranteed coverage: Contiguous ranges ensure no reads are skipped
No duplication: Block alignment ensures no reads are processed twice

Testing

Run the test suite to verify correctness:

cargo test

lint the codebase:

make lint

Development Commands

Run a coverage analysis:

make coverage && open target/coverage/html/index.html

Build a flamegraph for performance profiling:

make flamegraph && open flamegraph.svg

Run the performance benchmark:

make bench && open target/criterion/report/index.html

Release a new version:

echo "update Cargo.toml with new version"
git commit -m 'update package version to vX.Y.Z'
git tag -m 'tag for release' vX.Y.Z
git push --follow-tags
cargo publish

License

AGPLv3 - See LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github/workflows		.github/workflows
benches		benches
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
example.nf		example.nf
flamegraph.svg		flamegraph.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bamslice

Features

Installation

Usage

Arguments

Examples

Parallel Processing

Nextflow Example

How It Works

Why Byte Ranges?

Testing

Development Commands

License

About

Uh oh!

Releases 7

Packages

Contributors 3

Uh oh!

Languages

License

nebiolabs/bamslice

Folders and files

Latest commit

History

Repository files navigation

bamslice

Features

Installation

Usage

Arguments

Examples

Parallel Processing

Nextflow Example

How It Works

Why Byte Ranges?

Testing

Development Commands

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Contributors 3

Uh oh!

Languages

Packages