Skip to content

fix(kafka): seed named volumes from the image instead of nocopy#67

Merged
hectorvent merged 1 commit into
mainfrom
fix/kafka-volume-nocopy
Jun 19, 2026
Merged

fix(kafka): seed named volumes from the image instead of nocopy#67
hectorvent merged 1 commit into
mainfrom
fix/kafka-volume-nocopy

Conversation

@hectorvent

@hectorvent hectorvent commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

ContainerBuilder.withNamedVolume mounted named volumes with VolumeOptions.withNoCopy(true), so a fresh data volume was never seeded from the image and stayed root-owned. Non-root images such as Redpanda cannot write to it and crash on startup, so Managed Kafka clusters never reached ACTIVE.

Apply nocopy only to read-only volumes, which are pre-populated and must not be overlaid by the image (e.g. Cloud Run GCS snapshots). Read-write data volumes (Kafka, Cloud SQL) are now seeded from the image, preserving its initialized directory and ownership.

Type of change

  • Bug fix (fix:)
  • New feature (feat:)
  • Breaking change (feat!: or fix!:)
  • Docs / chore

GCP Compatibility

No wire-protocol change. Fixed mkdir /var/lib/redpanda/data/crash_reports: Permission denied → Redpanda container exit → cluster stuck CREATING (the emulator's 90s readiness wait then timing out into "No route to host" against the dead container). Verified: clusters reach ACTIVE in ~5s, the Java/Node/Python/Go Managed Kafka compat suites pass, and the Cloud Run read-only GCS volume mount (and its unit test) are unchanged. Also benefits any other non-root read-write sidecar (e.g. Cloud SQL Postgres).

Checklist

  • ./mvnw test passes locally (incl. CloudRunRuntimeServiceTest)
  • New or updated integration test added — N/A (shared container infra; exercised by CloudRunRuntimeServiceTest + the compatibility suites)
  • Commit messages follow Conventional Commits

ContainerBuilder.withNamedVolume mounted volumes with
VolumeOptions.withNoCopy(true), so a fresh data volume was never seeded from the
image and stayed root-owned. Non-root images such as Redpanda cannot write to it
and crash on startup (mkdir /var/lib/redpanda/data/crash_reports: Permission
denied), so Managed Kafka clusters never reached ACTIVE.

Apply nocopy only to read-only volumes, which are pre-populated and must not be
overlaid by the image (e.g. Cloud Run GCS snapshots). Read-write data volumes
(Kafka, Cloud SQL) are now seeded from the image, preserving the initialized
directory and ownership. Verified: clusters reach ACTIVE in ~5s, and the Cloud
Run read-only GCS volume mount is unchanged.
@hectorvent hectorvent force-pushed the fix/kafka-volume-nocopy branch from d0aedff to f0418f9 Compare June 18, 2026 23:40
@hectorvent hectorvent merged commit 4e37f18 into main Jun 19, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant