You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fix checkpoint determinism: bound coalescing to checkpoint interval (#187)
* Fix checkpoint determinism: bound coalescing to checkpoint interval
read_batches_until_min_rows greedily coalesces batch IDs from the work
plan to fill target_batch_rows(). Different SPICEBENCH_TARGET_BATCH_ROWS
values cause different batch IDs to be consumed per checkpoint interval,
making pre-generated checkpoint results invalid for validation.
Fix by constraining coalescing to the checkpoint interval boundary:
- reserve_next_batch_id_for_table: add max_batch_id parameter that
stops the candidate search at the checkpoint boundary.
- read_batches_until_min_rows: accept and forward max_batch_id to
the reservation function. Still targets target_batch_rows() for
batching performance within the allowed range.
- run_pipeline: compute checkpoint_max_batch_id from the first
step_limit keys in the work plan BTreeMap (None when unlimited).
- Outer loop: when the next batch ID exceeds the boundary, pause
the pipeline instead of processing it, so checkpoint validation
runs against exact data.
Within a checkpoint interval, coalescing works normally for performance.
At the boundary, no batch IDs from the next interval are consumed.
* chore: auto-fix cargo fmt + clippy
---------
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
0 commit comments