Skip to content

Unify Dockerfiles into a single multi-stage Dockerfile#465

Closed
shahidsi7 wants to merge 4 commits intoNetflix:masterfrom
shahidsi7:unify-dockerfiles
Closed

Unify Dockerfiles into a single multi-stage Dockerfile#465
shahidsi7 wants to merge 4 commits intoNetflix:masterfrom
shahidsi7:unify-dockerfiles

Conversation

@shahidsi7
Copy link
Copy Markdown

  • Replace Dockerfile.metadata_service, Dockerfile.migration_service, Dockerfile.ui_service, and Dockerfile.service.test with a single multi-stage Dockerfile (base, runtime, test stages)
  • Update all docker-compose files to use target: runtime or target: test
  • Add dos2unix to fix Windows CRLF line endings in shell scripts
  • Development hot-reload and all env vars preserved

- Replace Dockerfile.metadata_service, Dockerfile.migration_service,
  Dockerfile.ui_service, and Dockerfile.service.test with a single
  multi-stage Dockerfile (base, runtime, test stages)
- Update all docker-compose files to use target: runtime or target: test
- Add dos2unix to fix Windows CRLF line endings in shell scripts
- Development hot-reload and all env vars preserved
Comment thread Dockerfile
# -----------------------------------------------------------------------------
FROM golang:1.20.2 AS goose

FROM ${TARGETARCH}-golang as goose
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes lose the support for multiarch builds. This is needed for arm64 dev environments that cannot emulate x86 for their containerd.

Comment thread Dockerfile Outdated
ENV BUILD_COMMIT_HASH=$BUILD_COMMIT_HASH
# libpq-dev + gcc: required by psycopg2
# unzip + curl: required by download_ui.sh
# dos2unix: fixes Windows CRLF line endings in shell scripts
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which setup is this required for? will the docker compose not work through WSL2?

Comment thread Dockerfile Outdated

# Install Netflix/metaflow-ui release artifact
RUN /root/services/ui_backend_service/download_ui.sh
RUN pip install --editable .
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The install does need to be editable for the dev image yes, but does it make sense for the production image?

Comment thread Dockerfile
# Inherits from base so download_ui.sh is never called
# dos2unix fixes wait-for-postgres.sh line endings too
# -----------------------------------------------------------------------------
FROM base AS test
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this require changing the deployment CI as well so only layers up to runtimeget pushed, as nothing else is necessary for prod?

@shahidsi7
Copy link
Copy Markdown
Author

Thank you for reviewing my proposal and pointing out these issues. I checked the code and understood the problems. Here is my response:

  1. Multiarch builds
    Yes, you are correct. I will bring back the original multiarch setup which is already used in the production Dockerfile. I will restore the three-line pattern so that both amd64 and arm64 builds work properly:
FROM golang:1.20.2-buster as amd64-golang
FROM arm64v8/golang:1.20.2-buster as arm64-golang
FROM ${TARGETARCH}-golang as goose
  1. dos2unix
    I added this because in my local Windows system there was a CRLF line ending issue. But I understood that this should not be handled inside the project Dockerfile.
    The better solution is to add a .gitattributes file and enforce LF line endings for shell scripts. So I will remove dos2unix from the Dockerfile and add .gitattributes. Also using WSL2 will avoid this issue.

  2. Editable install
    After checking the production Dockerfile, I saw that it uses normal pip install . inside a virtual environment for production. Editable install is mainly useful for development.
    So I will move pip install --editable . to a separate dev stage that sits between base and runtime, so production uses regular install and development uses editable.

  3. CI/CD
    I checked .github/workflows/dockerimage.yml. Currently in the build step there is no --target flag, so it is building the full Dockerfile. Since we are using a multi-stage approach, we need to add --target runtime so that only the runtime stage image gets pushed to the registry. I can include this change in the same PR if it is in scope, otherwise I can do it in a follow-up PR.

- Restore multiarch goose build (amd64 + arm64)
- Fix base image to python:3.11.6-slim-bookworm
- Remove dos2unix, add .gitattributes for LF enforcement
- Add dev stage with editable install, keep base with regular install
- Update CI to build --target runtime only
@shahidsi7
Copy link
Copy Markdown
Author

shahidsi7 commented Mar 23, 2026

After reviewing my previous commits, I wasn't fully satisfied with how I had addressed the feedback — particularly around the multiarch support and the editable install separation. So I reworked the entire approach more carefully before pushing further. Here is the updated summary:


Summary of Changes

This PR replaces the multiple service-specific Dockerfiles with a single multi-stage Dockerfile. The goal is to improve maintainability while keeping existing behaviour intact. All steps were tested locally.

Files deleted

  • Dockerfile.metadata_service
  • Dockerfile.migration_service
  • Dockerfile.ui_service

These are replaced entirely by the unified Dockerfile below.

Files added

  • .gitattributes — enforces LF line endings for all .sh files, replacing the need for dos2unix inside the container. Developers on Windows should use WSL2 or ensure their Git client respects .gitattributes.

Dockerfile — rewritten as a 4-stage multi-stage build

Stage Purpose
amd64-golang / arm64-golang Multiarch goose builder — preserves original arm64/amd64 support
base System deps, virtualenvs, source copy — shared by both below
runtime Production image; uses pip install . (regular install)
dev Development image; uses pip install --editable . so volume-mounted ./services is live source without rebuilds

The sed -i 's/\r//' on download_ui.sh is kept as a safety net for developers who cloned before .gitattributes was added.

docker-compose.development.yml — updated

  • Removed platform: linux/amd64 from all services — multiarch is now handled in the Dockerfile itself
  • All three services (ui_backend, metadata, migration) now point to dockerfile: Dockerfile with target: dev
  • Each service keeps its explicit command: so the single image runs the right entrypoint per service

Files not changed

  • Dockerfile.service.test — kept separate as it serves a distinct testing purpose
  • docker-compose.test.yml — unchanged
  • docker-compose.yml (production) — unchanged; uses the pre-built metadata_service:latest image

CI note: The existing build workflow will need --target runtime added to the docker build step so only the production stage is pushed to the registry. This can be done in this PR or a follow-up — happy to include it if preferred.

Verified locally

docker build --target runtime -t metaflow-test-runtime .
docker build --target dev -t metaflow-test-dev --build-arg UI_ENABLED=0 .
docker compose -f docker-compose.development.yml build --no-cache
docker compose -f docker-compose.development.yml up
curl http://localhost:8080/ping  # metadata
curl http://localhost:8083/ping  # ui_backend

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up, download_ui.sh lost its executable bit here (100755 to 100644). The Dockerfile runs it directly with /root/services/ui_backend_service/download_ui.sh rather than through bash, so on a fresh clone where git honors file modes this would fail with "Permission denied" when UI_ENABLED=1. Either restoring the +x bit or switching to bash /root/services/ui_backend_service/download_ui.sh in the RUN command would fix it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough review! Good catch. I've switched to bash /root/services/ui_backend_service/download_ui.sh in the RUN command so it works correctly on fresh clones regardless of the file mode.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason Dockerfile.service.test and docker-compose.test.yml are left out of scope here? The test image still builds its own goose and Python separately, so dev and test environments can still drift on versions. Adding a test target to the unified Dockerfile would close that gap.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, leaving them out was an oversight. I've added a test stage to the unified Dockerfile that shares the same base as runtime and dev, so goose and Python versions are now consistent across all three environments. docker-compose.test.yml has been updated to use target: test and Dockerfile.service.test has been deleted.
Also found and fixed one more thing while testing in which migration was using depends_on: db without waiting for the healthcheck, causing it to fail on a clean start. Changed it to condition: service_healthy so migration waits for postgres to actually be ready before connecting.
All services verified working locally after the fixes:

curl http://localhost:8080/ping  # metadata
curl http://localhost:8083/ping  # ui_backend

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, the healthcheck fix is a good catch. Clean start reliability is easy to miss in dev but matters for CI.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah, it only showed up when I did a completely clean start with down -v, would have silently caused flaky failures in CI otherwise. Glad it's in now.

@Aryan95614
Copy link
Copy Markdown

Minor thing but there's both a .gitattributes and a gitattributes file in this PR. Git only reads the dotfile version so the gitattributes one doesn't do anything. Might be accidental.

- Use bash to invoke download_ui.sh to avoid permission denied on fresh clone
- Add test stage to unified Dockerfile to prevent dev/test version drift
- Remove standalone Dockerfile.service.test, replaced by test stage
- Remove accidental gitattributes (non-dotfile), .gitattributes already present
- Fix migration depends_on to wait for db healthcheck before starting
@shahidsi7 shahidsi7 closed this by deleting the head repository Mar 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants