Skip to content

Store container image ref and on-disk size in ClickHouse#11920

Open
dan-stowell wants to merge 10 commits intomasterfrom
find-big-images
Open

Store container image ref and on-disk size in ClickHouse#11920
dan-stowell wants to merge 10 commits intomasterfrom
find-big-images

Conversation

@dan-stowell
Copy link
Copy Markdown
Contributor

Add container_image and container_image_size_bytes fields to track the container image used for each execution and its estimated on-disk size after pulling/extracting.

Data flow:

  • Proto: Add fields to ExecutionAuxiliaryMetadata (executor->server) and StoredExecution (server->ClickHouse buffer)
  • Runner interface: Add ContainerImageInfo() method
  • Container interface: Add optional ImageSizer interface with ImageSizeBytes() method
  • Implementations:
    • OCI: sum of EstimatedDiskUsageBytes across all layers
    • Firecracker: stat the cached ext4 disk image file
    • Docker: ImageInspect API call for image Size
    • Podman: podman image inspect --format {{.Size}}
  • Executor: capture image ref + size after PrepareForTask, set on auxiliary metadata
  • Execution server: read from aux metadata, write to StoredExecution
  • ClickHouse schema: add ContainerImage and ContainerImageSizeBytes columns to Executions table

This enables querying for executions using large container images (e.g. >6GB yellow zone, >10GB red zone) to help customers optimize their image sizes.

@dan-stowell dan-stowell force-pushed the find-big-images branch 2 times, most recently from e09bb34 to b5fd10c Compare April 21, 2026 13:53
@dan-stowell dan-stowell marked this pull request as ready for review April 21, 2026 14:03
Copy link
Copy Markdown
Contributor

@vanja-p vanja-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lgtm, but I think Brandon should take a look as well before merging.

Comment thread proto/execution_stats.proto Outdated
Comment thread enterprise/server/remote_execution/executor/executor_test.go Outdated
Comment thread server/util/clickhouse/schema/schema.go Outdated
dan-stowell and others added 8 commits April 22, 2026 14:55
Add container_image and container_image_size_bytes fields to track the
container image used for each execution and its estimated on-disk size
after pulling/extracting.

Data flow:
- Proto: Add fields to ExecutionAuxiliaryMetadata (executor->server) and
  StoredExecution (server->ClickHouse buffer)
- Runner interface: Add ContainerImageInfo() method
- Container interface: Add optional ImageSizer interface with
  ImageSizeBytes() method
- Implementations:
  - OCI: sum of EstimatedDiskUsageBytes across all layers
  - Firecracker: stat the cached ext4 disk image file
  - Docker: ImageInspect API call for image Size
  - Podman: runPodman image inspect --format {{.Size}}
- Executor: capture image ref + size after PrepareForTask, set on
  auxiliary metadata
- Execution server: read from aux metadata, write to StoredExecution
- ClickHouse schema: add ContainerImage and ContainerImageSizeBytes
  columns to Executions table

This enables querying for executions using large container images
(e.g. >6GB yellow zone, >10GB red zone) to help customers optimize
their image sizes.

Co-authored-by: Shelley <shelley@exe.dev>
Add assertions across three test files to verify the container image
metadata plumbing:

- container_test: Test TracedCommandContainer.ImageSizeBytes() delegation
  with and without an ImageSizer delegate.
- runner_test: Test ContainerImageInfo() on taskRunner, verifying image
  size is reported when the container implements ImageSizer, and zero
  when it doesn't.
- executor_test: End-to-end test that container image size flows through
  ExecuteTaskAndStreamResults into ExecutionAuxiliaryMetadata.

Also extend rbetest.TestRunnerOverrides to accept runner.PoolOptions,
enabling tests to inject custom ContainerProviders.

Co-authored-by: Shelley <shelley@exe.dev>
- runner.go: Call r.Container.ImageSizeBytes() instead of reaching
  through to r.Container.Delegate for the ImageSizer type assertion.
  TracedCommandContainer already handles this delegation.
- docker.go, firecracker.go: Add comments explaining the intentional
  use of context.TODO() since ImageSizer interface doesn't take a ctx.

Co-authored-by: Shelley <shelley@exe.dev>
Thread context through the full ImageSizeBytes call chain:
  executor.go -> Runner.ContainerImageInfo(ctx) ->
  TracedCommandContainer.ImageSizeBytes(ctx) ->
  ImageSizer.ImageSizeBytes(ctx)

This eliminates three context.TODO() calls in production code (docker,
podman, firecracker) by plumbing the execution context from the
executor down to each container implementation.

Co-authored-by: Shelley <shelley@exe.dev>
…mmandContainer

- Rename proto fields container_image -> container_image_ref in both
  ExecutionAuxiliaryMetadata and StoredExecution, and update all Go
  references and ClickHouse schema accordingly.
- Remove the separate ImageSizer interface and add ImageSizeBytes(ctx)
  directly to the CommandContainer interface. Add trivial return-0
  implementations to bare and sandbox containers.

Co-authored-by: Shelley <shelley@exe.dev>
@dan-stowell
Copy link
Copy Markdown
Contributor Author

@luluz66 has asked that I hold off on landing this change until the Executions table is migrated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants