feat: Add custom_image input to debug Spice Cloud workflow#238
Merged
Conversation
ed48acd to
29161f2
Compare
Add a custom_image workflow input to run_spicebench_debug_spice_cloud that allows specifying a custom runtime container image (e.g. ghcr.io/spiceai/spiceai-dev:spicebench-sf10) instead of the default nightly image. When set, the image reference is parsed into registry, image name, and tag components and passed through to spidapter as SPIDAPTER_IMAGE_REGISTRY, SPIDAPTER_IMAGE_NAME, and SPIDAPTER_IMAGE_TAG env vars. The channel is automatically switched to internal. Also adds executor_memory_limit input and fixes NUM_QUERY_CLIENTS to match the main workflow (2 instead of 8).
316e8d8 to
8fe6fb7
Compare
Move table row count validation to run as the first phase (Phase 0) before the probe query. Row count queries are cheap SELECT COUNT(*) and immediately surface data loss or duplication without waiting for expensive analytical queries to converge.
8fe6fb7 to
690b185
Compare
Log insert/update/delete row counts for each batch written through
write_segments_for_batch. This covers both the initialization phase
and the main ETL run pipeline.
Example log output:
INFO etl: Writing segments for batch table=customer batch_id=5
segments=3 insert_rows=8192 update_rows=512 delete_rows=128
This allows post-hoc reconciliation: summing insert_rows - delete_rows
per table should match the expected row count at each checkpoint.
If there is a mismatch, the per-batch logs pinpoint which batch_id
had unexpected operation counts.
Jeadie
pushed a commit
that referenced
this pull request
Apr 2, 2026
* feat: Add custom_image input to debug Spice Cloud workflow
Add a custom_image workflow input to run_spicebench_debug_spice_cloud
that allows specifying a custom runtime container image (e.g.
ghcr.io/spiceai/spiceai-dev:spicebench-sf10) instead of the default
nightly image.
When set, the image reference is parsed into registry, image name, and
tag components and passed through to spidapter as SPIDAPTER_IMAGE_REGISTRY,
SPIDAPTER_IMAGE_NAME, and SPIDAPTER_IMAGE_TAG env vars. The channel is
automatically switched to internal.
Also adds executor_memory_limit input and fixes NUM_QUERY_CLIENTS to
match the main workflow (2 instead of 8).
* fix: Run row count validation first in checkpoint validation
Move table row count validation to run as the first phase (Phase 0)
before the probe query. Row count queries are cheap SELECT COUNT(*)
and immediately surface data loss or duplication without waiting for
expensive analytical queries to converge.
* chore: Add per-batch operation row count logging for data reconciliation
Log insert/update/delete row counts for each batch written through
write_segments_for_batch. This covers both the initialization phase
and the main ETL run pipeline.
Example log output:
INFO etl: Writing segments for batch table=customer batch_id=5
segments=3 insert_rows=8192 update_rows=512 delete_rows=128
This allows post-hoc reconciliation: summing insert_rows - delete_rows
per table should match the expected row count at each checkpoint.
If there is a mismatch, the per-batch logs pinpoint which batch_id
had unexpected operation counts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add a
custom_imageworkflow input torun_spicebench_debug_spice_cloudthat allows specifying a custom runtime container image (e.g.ghcr.io/spiceai/spiceai-dev:spicebench-sf10) instead of the default nightly image.How it works
When
custom_imageis set, the image reference is parsed into components:ghcr.io/spiceai/spiceai-dev:spicebench-sf10→registry=ghcr.io,image=spiceai/spiceai-dev,tag=spicebench-sf10These are passed to spidapter as
SPIDAPTER_IMAGE_REGISTRY,SPIDAPTER_IMAGE_NAME, andSPIDAPTER_IMAGE_TAGenv vars. The channel is automatically switched fromnightlytointernal.Other changes
executor_memory_limitinput (was missing from debug workflow but present in main workflow)Depends on
--image-registry,--image-name,--image-tagsupport to spidapterspiceai/spiceai-devtoINTERNAL_RUNTIME_IMAGES