Skip to content

fix: --batch-bytes limits sequence data size, not inflated cost#26

Merged
ekg merged 4 commits intomainfrom
fix-batch-bytes
Mar 14, 2026
Merged

fix: --batch-bytes limits sequence data size, not inflated cost#26
ekg merged 4 commits intomainfrom
fix-batch-bytes

Conversation

@ekg
Copy link
Copy Markdown
Contributor

@ekg ekg commented Mar 13, 2026

Summary

  • --batch-bytes 50m now means "50 MB of sequence data per batch" instead of being compared against a cost model with a 100 MB fixed overhead that made every genome exceed the limit
  • Removed dead cost model code: FASTGA_OVERHEAD_BYTES, WFMASH_BYTES_PER_BP, BatchLimits, estimate_index_size, estimate_memory_usage, resolve_batch_bytes, and estimate_cost/resource_label/auto_limit trait methods (-224 lines)
  • Works uniformly for both --aligner fastga and --aligner wfmash

Test plan

  • CI passes
  • --batch-bytes 50m with multi-genome FASTA groups genomes into batches (not one per batch)

ekg added 4 commits March 13, 2026 12:52
… estimate

The batch partitioning had a 100MB fixed overhead in its cost model,
so --batch-bytes 50m would put every genome in its own batch (since
the overhead alone exceeded the limit). Now --batch-bytes simply
controls the maximum basepairs of sequence data per batch, which
works uniformly for both FastGA and wfmash.

Removed unused cost model constants (FASTGA_OVERHEAD_BYTES,
WFMASH_BYTES_PER_BP, etc.), BatchLimits struct, estimate_* functions,
and the estimate_cost/resource_label/auto_limit trait methods.
git describe output changes without any file changing, so Cargo
doesn't re-run build.rs and the cache key goes stale. The new key
is {version}_f{fastga_rev}_w{wfmash_rev}, parsed from Cargo.lock.
This only changes when Cargo.lock changes, which is already a
rerun-if-changed trigger.
wfmash-rs build.rs checks for PORTABLE, not WFMASH_PORTABLE.
The wrong name meant wfmash was always compiled with -march=native,
causing SIGILL crashes when rust-cache restores binaries built on
a runner with different CPU features. Also bumps cache prefix to
invalidate stale non-portable binaries.
- Add samtools faidx calls to create .fai indexes for synthetic FASTA
  files (required since avg_seq_len_from_fai is called unconditionally)
- Add --aligner fastga to all tests in fastga_integration.rs (they were
  using the default wfmash aligner despite being FastGA tests)
- Use larger synthetic sequences in test_thread_parameter (64bp was too
  small even for wfmash's minimum 100bp segment length)
@ekg ekg merged commit e45e504 into main Mar 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant