This project provides a Rust implementation of a compact, reversible DNA compression format using 4-bit encoding. It supports all 15 IUPAC nucleotide codes, base-pair complement generation, and efficient serialization. The tool is intended for developers working with large-scale biological data or exploring performance-efficient data representations.
- 4-bit encoding of nucleotides (2× size reduction compared to plain text)
- Full IUPAC code support
- Bitwise base-pair complement generation
- Fast decompression with padding-aware decoding
- CLI interface with support for compression, decompression, complement, and benchmarking
Clone the repository and build the tool using Cargo.
Compress a DNA sequence
cargo run . -- input.txtThis will read input.txt (a plain-text DNA sequence) and write a compressed binary file to output.txt.
cargo run . -- input.txt --complimentThis applies a bitwise complement to the sequence prior to compression.
cargo run . -- input.txt --benchmarkRuns a performance comparison between bitwise rotation and match-based complement logic. Benchmark output is written to speed_test.csv.
cargo run . -- output.txt --decodeReads output.txt (binary) and reconstructs the original DNA sequence in decoded.txt.