-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(docker): reduce size between docker builds #7571
base: main
Are you sure you want to change the base?
Conversation
by adding a layer with all the pytorch dependencies that don't change most of the time.
# Auxiliary dependencies, pinned only if necessary. | ||
"triton; sys_platform=='linux'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, CUDA builds of torch 2.4.1 have a dependency on triton, which has two consequences:
- triton is installed without us declaring the dependency here.
- some versions of triton conflict with that torch dependency. i.e. torch 2.4.1+cu124 requires 3.0.0 and will conflict with 3.1.0.
If we had an exact dependency on torch==2.4.1+cu124
, uv's resolver would figure it out which version of triton works, and it would be fine.
However, since the torch version is left ambiguous, it's possible for the resolver to decide that non-CUDA torch==2.4.1 is a better solution since that doesn't conflict with the latest version of xformers. 😖
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll hopefully be bumping torch soon to support the new Nvidia GPUs, so we'll address it then. There's also an additional quirk of requiring a different torch pin for AMD, so the rabbit hole will go even deeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, and uv has a documentation page about torch specifically: https://docs.astral.sh/uv/guides/integration/pytorch/#configuring-accelerators-with-optional-dependencies
I did make an attempt at implementing that here, but its scope quickly grew far past what I wanted to deal with in this docker-focused PR.
This Dockerfile is also quirky in that it separates There is one thing I haven't confirmed for the space savings: my test builds have been with podman (buildah), not buildkit. buildah doesn't support |
hey @keturn , thanks for this! we tried some time ago with |
Indeed, this is needed to build patchmatch. Hope at some point we no longer have to do this, but for now it's ok. There's still a small benefit to using the runner image because we just don't need to worry about any other cruft that may be left in the builder image, so i'd like to keep it this way for the time being. |
docker/Dockerfile
Outdated
ENV PYTHON_VERSION=3.11 | ||
ENV INVOKEAI_ROOT=/invokeai | ||
ENV INVOKEAI_HOST=0.0.0.0 | ||
ENV INVOKEAI_PORT=9090 | ||
ENV PATH="$VIRTUAL_ENV/bin:$INVOKEAI_SRC:$PATH" | ||
ENV PATH="$INVOKEAI_SRC/.venv/bin:$INVOKEAI_SRC:$PATH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping the .venv
out of the source dir enables using this image for development on non-Linux systems. E.g. someone on a Mac can mount their INVOKEAI_SRC into the image while avoiding compatibility issues with their local macOS-specific .venv. We should keep it as-is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright, I believe I've resolved this so it's back to using /opt/venv
including just invokeai/version seems sufficient to appease uv sync here. including everything else would invalidate the cache we're trying to establish.
This got me to re-visit the docs on docker cache invalidation. From what I gather, the upshot is
well, that's a bit disappointing, but I guess this PR is still a step in the right direction if we are going all-in with uv. |
by adding a layer with all the pytorch dependencies that don't change most of the time.
Summary
Every time the
main
docker images rebuild and I pullmain-cuda
, it gets another 3+ GB, which seems like about a zillion times too much since most things don't change from one commit onmain
to the next.This is an attempt to follow the guidance in Using uv in Docker: Intermediate Layers so there's one layer that installs all the dependencies—including PyTorch with its bundled nvidia libraries—before the project's own frequently-changing files are copied in to the image.
Related Issues / Discussions
uv pip install
torch, but notuv sync
itQA Instructions
Hopefully the CI system building the docker images is sufficient.
But there is one change to
pyproject.toml
related to xformers, so it'd be worth checking thatpython -m xformers.info
still says it has triton on the platforms that expect it.Merge Plan
I don't expect this to be a disruptive merge.
(An earlier revision of this PR moved the venv, but I've reverted that change at ebr's recommendation.)
Checklist
What's New
copy (if doing a release after this PR)