Skip to content

Commit c8e70ff

Browse files
Refactor batch alignment to support wfmash and improve resource management (#24)
1 parent 741c951 commit c8e70ff

File tree

6 files changed

+550
-166
lines changed

6 files changed

+550
-166
lines changed

Cargo.lock

Lines changed: 6 additions & 6 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ rayon = "1.10"
3030
byteorder = "1.5"
3131
tempfile = "3.8"
3232
ordered-float = "4.0"
33-
fastga-rs = { git = "https://github.com/pangenome/fastga-rs", rev = "c0bf620" }
34-
wfmash-rs = { git = "https://github.com/pangenome/wfmash-rs", rev = "e3fb64ef" }
35-
ragc-core = { git = "https://github.com/ekg/ragc", rev = "40e5cad" }
36-
onecode = { git = "https://github.com/pangenome/onecode-rs", rev = "f531f5b" }
33+
fastga-rs = { git = "https://github.com/pangenome/fastga-rs", rev = "c0bf6202506f51c77e47cf8707e58b7b61e2b621" }
34+
wfmash-rs = { git = "https://github.com/pangenome/wfmash-rs", rev = "e1207fc50487fb5065f4211d175b3c238fcca167" }
35+
ragc-core = { git = "https://github.com/ekg/ragc", rev = "40e5cad11cab7d4df07a72d6b16d68c2d60b0742" }
36+
onecode = { git = "https://github.com/pangenome/onecode-rs", rev = "f531f5b0ff54001a898ec4e0c0c761b2bd0a1f34" }
3737
chrono = "0.4"
3838
flate2 = "1.0"
3939
libc = "0.2"

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# SweepGA
22

3-
Fast genome alignment with plane sweep filtering and scaffolding. Wraps FastGA aligner and applies plane sweep and other filtering methods to keep the best non-overlapping alignments.
3+
Fast genome alignment with plane sweep filtering and scaffolding. Wraps wfmash and FastGA aligners and applies plane sweep and other filtering methods to keep the best non-overlapping alignments.
44

55
## What it does
66

77
SweepGA can:
8-
1. **Align FASTA files directly** using integrated FastGA (supports .fa.gz)
8+
1. **Align FASTA files directly** using integrated wfmash or FastGA (supports .fa.gz)
99
2. **Filter existing alignments** from any aligner (wfmash, minimap2, etc.)
1010
3. **Apply scaffolding/chaining** to merge nearby alignments into syntenic regions
1111
4. **Output multiple formats**: PAF (text) or .1aln (binary ONE format)
@@ -251,7 +251,7 @@ sweepga alignments.paf -j 50k -m many:many > filtered.paf
251251

252252
**`--check-fastga`** - Check FastGA binary locations and exit (diagnostic)
253253

254-
**`--aligner <ALIGNER>`** - Aligner for FASTA input (default: `fastga`)
254+
**`--aligner <ALIGNER>`** - Aligner for FASTA input: `wfmash` or `fastga` (default: `wfmash`)
255255

256256
**`-f/--frequency <N>`** - FastGA k-mer frequency threshold
257257

@@ -261,7 +261,7 @@ sweepga alignments.paf -j 50k -m many:many > filtered.paf
261261

262262
### Batch Processing
263263

264-
**`--batch-bytes <SIZE>`** - Maximum index size per batch (e.g., `10G`, `500M`). Partitions genomes to fit disk limits when scratch space is limited.
264+
**`--batch-bytes <SIZE>`** - Maximum resource usage per batch (e.g., `10G`, `500M`). Partitions genomes into batches. FastGA: limits disk (index ~0.1GB + 12 bytes/bp). Wfmash: limits memory (~0.5GB + 20 bytes/bp).
265265

266266
**`--zstd`** - Compress k-mer index with zstd for ~2x disk savings and faster I/O
267267

0 commit comments

Comments
 (0)