Unify Dockerfiles into a single multi-stage Dockerfile#465
Unify Dockerfiles into a single multi-stage Dockerfile#465shahidsi7 wants to merge 4 commits intoNetflix:masterfrom
Conversation
shahidsi7
commented
Mar 6, 2026
- Replace Dockerfile.metadata_service, Dockerfile.migration_service, Dockerfile.ui_service, and Dockerfile.service.test with a single multi-stage Dockerfile (base, runtime, test stages)
- Update all docker-compose files to use target: runtime or target: test
- Add dos2unix to fix Windows CRLF line endings in shell scripts
- Development hot-reload and all env vars preserved
- Replace Dockerfile.metadata_service, Dockerfile.migration_service, Dockerfile.ui_service, and Dockerfile.service.test with a single multi-stage Dockerfile (base, runtime, test stages) - Update all docker-compose files to use target: runtime or target: test - Add dos2unix to fix Windows CRLF line endings in shell scripts - Development hot-reload and all env vars preserved
| # ----------------------------------------------------------------------------- | ||
| FROM golang:1.20.2 AS goose | ||
|
|
||
| FROM ${TARGETARCH}-golang as goose |
There was a problem hiding this comment.
changes lose the support for multiarch builds. This is needed for arm64 dev environments that cannot emulate x86 for their containerd.
| ENV BUILD_COMMIT_HASH=$BUILD_COMMIT_HASH | ||
| # libpq-dev + gcc: required by psycopg2 | ||
| # unzip + curl: required by download_ui.sh | ||
| # dos2unix: fixes Windows CRLF line endings in shell scripts |
There was a problem hiding this comment.
which setup is this required for? will the docker compose not work through WSL2?
|
|
||
| # Install Netflix/metaflow-ui release artifact | ||
| RUN /root/services/ui_backend_service/download_ui.sh | ||
| RUN pip install --editable . |
There was a problem hiding this comment.
The install does need to be editable for the dev image yes, but does it make sense for the production image?
| # Inherits from base so download_ui.sh is never called | ||
| # dos2unix fixes wait-for-postgres.sh line endings too | ||
| # ----------------------------------------------------------------------------- | ||
| FROM base AS test |
There was a problem hiding this comment.
does this require changing the deployment CI as well so only layers up to runtimeget pushed, as nothing else is necessary for prod?
|
Thank you for reviewing my proposal and pointing out these issues. I checked the code and understood the problems. Here is my response:
|
- Restore multiarch goose build (amd64 + arm64) - Fix base image to python:3.11.6-slim-bookworm - Remove dos2unix, add .gitattributes for LF enforcement - Add dev stage with editable install, keep base with regular install - Update CI to build --target runtime only
af91459 to
d09b2f2
Compare
|
After reviewing my previous commits, I wasn't fully satisfied with how I had addressed the feedback — particularly around the multiarch support and the editable install separation. So I reworked the entire approach more carefully before pushing further. Here is the updated summary: Summary of Changes This PR replaces the multiple service-specific Dockerfiles with a single multi-stage Dockerfile. The goal is to improve maintainability while keeping existing behaviour intact. All steps were tested locally. Files deleted
These are replaced entirely by the unified Files added
The
Files not changed
CI note: The existing build workflow will need Verified locally |
There was a problem hiding this comment.
Heads up, download_ui.sh lost its executable bit here (100755 to 100644). The Dockerfile runs it directly with /root/services/ui_backend_service/download_ui.sh rather than through bash, so on a fresh clone where git honors file modes this would fail with "Permission denied" when UI_ENABLED=1. Either restoring the +x bit or switching to bash /root/services/ui_backend_service/download_ui.sh in the RUN command would fix it.
There was a problem hiding this comment.
Thanks for the thorough review! Good catch. I've switched to bash /root/services/ui_backend_service/download_ui.sh in the RUN command so it works correctly on fresh clones regardless of the file mode.
There was a problem hiding this comment.
Any reason Dockerfile.service.test and docker-compose.test.yml are left out of scope here? The test image still builds its own goose and Python separately, so dev and test environments can still drift on versions. Adding a test target to the unified Dockerfile would close that gap.
There was a problem hiding this comment.
You're right, leaving them out was an oversight. I've added a test stage to the unified Dockerfile that shares the same base as runtime and dev, so goose and Python versions are now consistent across all three environments. docker-compose.test.yml has been updated to use target: test and Dockerfile.service.test has been deleted.
Also found and fixed one more thing while testing in which migration was using depends_on: db without waiting for the healthcheck, causing it to fail on a clean start. Changed it to condition: service_healthy so migration waits for postgres to actually be ready before connecting.
All services verified working locally after the fixes:
curl http://localhost:8080/ping # metadata
curl http://localhost:8083/ping # ui_backend
There was a problem hiding this comment.
Nice, the healthcheck fix is a good catch. Clean start reliability is easy to miss in dev but matters for CI.
There was a problem hiding this comment.
Thanks! Yeah, it only showed up when I did a completely clean start with down -v, would have silently caused flaky failures in CI otherwise. Glad it's in now.
|
Minor thing but there's both a .gitattributes and a gitattributes file in this PR. Git only reads the dotfile version so the gitattributes one doesn't do anything. Might be accidental. |
- Use bash to invoke download_ui.sh to avoid permission denied on fresh clone - Add test stage to unified Dockerfile to prevent dev/test version drift - Remove standalone Dockerfile.service.test, replaced by test stage - Remove accidental gitattributes (non-dotfile), .gitattributes already present - Fix migration depends_on to wait for db healthcheck before starting