fix: --batch-bytes limits sequence data size, not inflated cost by ekg · Pull Request #26 · pangenome/sweepga

ekg · 2026-03-13T17:52:39Z

Summary

--batch-bytes 50m now means "50 MB of sequence data per batch" instead of being compared against a cost model with a 100 MB fixed overhead that made every genome exceed the limit
Removed dead cost model code: FASTGA_OVERHEAD_BYTES, WFMASH_BYTES_PER_BP, BatchLimits, estimate_index_size, estimate_memory_usage, resolve_batch_bytes, and estimate_cost/resource_label/auto_limit trait methods (-224 lines)
Works uniformly for both --aligner fastga and --aligner wfmash

Test plan

CI passes
--batch-bytes 50m with multi-genome FASTA groups genomes into batches (not one per batch)

… estimate The batch partitioning had a 100MB fixed overhead in its cost model, so --batch-bytes 50m would put every genome in its own batch (since the overhead alone exceeded the limit). Now --batch-bytes simply controls the maximum basepairs of sequence data per batch, which works uniformly for both FastGA and wfmash. Removed unused cost model constants (FASTGA_OVERHEAD_BYTES, WFMASH_BYTES_PER_BP, etc.), BatchLimits struct, estimate_* functions, and the estimate_cost/resource_label/auto_limit trait methods.

git describe output changes without any file changing, so Cargo doesn't re-run build.rs and the cache key goes stale. The new key is {version}_f{fastga_rev}_w{wfmash_rev}, parsed from Cargo.lock. This only changes when Cargo.lock changes, which is already a rerun-if-changed trigger.

wfmash-rs build.rs checks for PORTABLE, not WFMASH_PORTABLE. The wrong name meant wfmash was always compiled with -march=native, causing SIGILL crashes when rust-cache restores binaries built on a runner with different CPU features. Also bumps cache prefix to invalidate stale non-portable binaries.

- Add samtools faidx calls to create .fai indexes for synthetic FASTA files (required since avg_seq_len_from_fai is called unconditionally) - Add --aligner fastga to all tests in fastga_integration.rs (they were using the default wfmash aligner despite being FastGA tests) - Use larger synthetic sequences in test_thread_parameter (64bp was too small even for wfmash's minimum 100bp segment length)

ekg added 4 commits March 13, 2026 12:52

ekg merged commit e45e504 into main Mar 14, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: --batch-bytes limits sequence data size, not inflated cost#26

fix: --batch-bytes limits sequence data size, not inflated cost#26
ekg merged 4 commits intomainfrom
fix-batch-bytes

ekg commented Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ekg commented Mar 13, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant