You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add custom_image input to debug Spice Cloud workflow (#238)
* feat: Add custom_image input to debug Spice Cloud workflow
Add a custom_image workflow input to run_spicebench_debug_spice_cloud
that allows specifying a custom runtime container image (e.g.
ghcr.io/spiceai/spiceai-dev:spicebench-sf10) instead of the default
nightly image.
When set, the image reference is parsed into registry, image name, and
tag components and passed through to spidapter as SPIDAPTER_IMAGE_REGISTRY,
SPIDAPTER_IMAGE_NAME, and SPIDAPTER_IMAGE_TAG env vars. The channel is
automatically switched to internal.
Also adds executor_memory_limit input and fixes NUM_QUERY_CLIENTS to
match the main workflow (2 instead of 8).
* fix: Run row count validation first in checkpoint validation
Move table row count validation to run as the first phase (Phase 0)
before the probe query. Row count queries are cheap SELECT COUNT(*)
and immediately surface data loss or duplication without waiting for
expensive analytical queries to converge.
* chore: Add per-batch operation row count logging for data reconciliation
Log insert/update/delete row counts for each batch written through
write_segments_for_batch. This covers both the initialization phase
and the main ETL run pipeline.
Example log output:
INFO etl: Writing segments for batch table=customer batch_id=5
segments=3 insert_rows=8192 update_rows=512 delete_rows=128
This allows post-hoc reconciliation: summing insert_rows - delete_rows
per table should match the expected row count at each checkpoint.
If there is a mismatch, the per-batch logs pinpoint which batch_id
had unexpected operation counts.
0 commit comments