-
Notifications
You must be signed in to change notification settings - Fork 66
Bump based CUDA image to ubuntu24.04 #1166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 20 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
d602ff3
Test docker hub ubuntu24.04
DwarKapex 7a93390
Adobt build for ubuntu-24.04
DwarKapex 3f4efa5
Fix build for pax, t5x, gemma
DwarKapex b2eab65
Use master branch of TF-text
DwarKapex 71ad68b
Fix gemma TF-text urls
DwarKapex 0b452c4
Fix T5x build
DwarKapex 62e7ed7
Address comments
DwarKapex beb4f82
Fix gemma build
DwarKapex 3c2ec97
Clone airio
DwarKapex d279373
Merge remote-tracking branch 'origin/main' into vkozlov/move-to-ubunt…
DwarKapex 173ddc5
Update maxtext docker
DwarKapex 92996e3
Uninstall several packages and add PIP_BREAK_SYSTEM_PACKAGES=1 env var
DwarKapex 8993deb
Uninstall several packages and add PIP_BREAK_SYSTEM_PACKAGES=1 env var
DwarKapex 8c10287
Edit remove packages list
DwarKapex c75c825
Edit remove packages list
DwarKapex 8468c9f
Edit remove packages list
DwarKapex 008b3fc
[skip ci] Resurect amd64/arm64 dockerfiles
DwarKapex d633578
[skip ci] Resurect amd64/arm64 dockerfiles: fix whitespace error
DwarKapex 81b50cc
[skip ci] Resurect amd64/arm64 dockerfiles: fix whitespace error
DwarKapex 14c52be
Merge branch 'main' into vkozlov/move-to-ubuntu24.04
DwarKapex 96c16a9
Add comment for pip install pip-23.3.1
DwarKapex 8461c7a
Merge branch 'vkozlov/move-to-ubuntu24.04' of github.com:NVIDIA/JAX-T…
DwarKapex 2c1ee0d
remove arch-specific Dockerfiles and add pointer to utopian versions
yhtang File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| # syntax=docker/dockerfile:1-labs | ||
| ARG BASE_IMAGE=nvidia/cuda:12.6.2-devel-ubuntu22.04 | ||
| ARG BASE_IMAGE=nvidia/cuda:12.6.2-devel-ubuntu24.04 | ||
| ARG GIT_USER_NAME="JAX Toolbox" | ||
| ARG [email protected] | ||
| ARG CLANG_VERSION=18 | ||
|
|
@@ -60,7 +60,8 @@ apt_packages=( | |
| wget | ||
| jq | ||
| # llvm.sh | ||
| lsb-release software-properties-common | ||
| lsb-release | ||
| software-properties-common | ||
| # GCP autoconfig | ||
| pciutils hwloc bind9-host | ||
| ) | ||
|
|
@@ -74,8 +75,6 @@ apt-get install -y ${apt_packages[@]} | |
|
|
||
| # Install LLVM/Clang | ||
| bash -c "$(wget -O - https://apt.llvm.org/llvm.sh)" -- ${CLANG_VERSION} | ||
| apt-get remove -y software-properties-common lsb-release | ||
| apt-get autoremove -y # removes python3-blinker which conflicts with pip-compile in JAX | ||
|
|
||
| # Make sure that clang and clang++ point to the new version. This list is based | ||
| # on the symlinks installed by the `clang` (as opposed to `clang-14`) and `lld` | ||
|
|
@@ -106,6 +105,21 @@ EOL | |
|
|
||
| apt-get clean | ||
| rm -rf /var/lib/apt/lists/* | ||
|
|
||
| # There are several python packages (in the list below) that are installed with OS | ||
| # package manager (the run of `apt-get install` above) and can not be uninstall | ||
| # using pip (in pip-finalize.sh script) during JAX installation. Remove then in | ||
| # advance to avoid JAX installation issue. | ||
| remove_packages=( | ||
| python3-gi | ||
| software-properties-common | ||
| lsb-release | ||
| python3-yaml | ||
| python3-pygments | ||
| ) | ||
|
|
||
| apt-get remove -y ${remove_packages[@]} | ||
| apt-get autoremove -y # removes python3-blinker which conflicts with pip-compile in JAX | ||
| EOF | ||
|
|
||
| RUN <<"EOF" bash -ex | ||
|
|
@@ -129,7 +143,11 @@ git apply </opt/pip/pip-vcs-equivalency.patch | |
| git add -u | ||
| git commit -m 'Adds JAX_TOOLBOX_VCS_EQUIVALENCY as a trigger to treat all github VCS installs for a package as equivalent. The spec of the last encountered version will be used' | ||
| EOF | ||
| RUN pip install --upgrade --no-cache-dir -e /opt/pip pip-tools && rm -rf ~/.cache/* | ||
|
|
||
| # install all python packages system-wide. | ||
| ENV PIP_BREAK_SYSTEM_PACKAGES=1 | ||
| RUN pip install --upgrade --ignore-installed --no-cache-dir -e /opt/pip pip-tools && rm -rf ~/.cache/* | ||
|
|
||
|
|
||
| ############################################################################### | ||
| ## Install TCPx | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,87 @@ | ||
| # syntax=docker/dockerfile:1-labs | ||
|
|
||
| ARG BASE_IMAGE=ghcr.io/nvidia/jax-mealkit:jax | ||
| ARG URLREF_MAXTEXT=https://github.com/google/maxtext.git#main | ||
| ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#master | ||
DwarKapex marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ARG SRC_PATH_MAXTEXT=/opt/maxtext | ||
| ARG SRC_PATH_TFTEXT=/opt/tensorflow-text | ||
|
|
||
| ############################################################################### | ||
| ## build tensorflow-text and lingvo, which do not have working arm64 pip wheels | ||
| ############################################################################### | ||
|
|
||
| ARG BASE_IMAGE | ||
| FROM ${BASE_IMAGE} as wheel-builder | ||
|
|
||
| #------------------------------------------------------------------------------ | ||
| # build tensorflow-text from source | ||
| #------------------------------------------------------------------------------ | ||
|
|
||
| # Remove TFTEXT build from source when it has py-3.12 wheels for x86/arm64 | ||
| FROM wheel-builder as tftext-builder | ||
| ARG URLREF_TFTEXT | ||
| ARG SRC_PATH_TFTEXT | ||
|
|
||
| RUN pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.18.0 | ||
| RUN git-clone.sh ${URLREF_TFTEXT} ${SRC_PATH_TFTEXT} | ||
| RUN <<"EOF" bash -exu -o pipefail | ||
| cd ${SRC_PATH_TFTEXT} | ||
|
|
||
| # The tftext build script queries GitHub, but these requests are sometimes | ||
| # throttled by GH, resulting in a corrupted uri for tensorflow in WORKSPACE. | ||
| # A workaround (needs to be updated when the tensorflow version changes): | ||
| sed -i "s/# Update TF dependency to installed tensorflow./commit_slug=6550e4bd80223cdb8be6c3afd1f81e86a4d433c3/" oss_scripts/prepare_tf_dep.sh | ||
|
|
||
| # Newer versions of LLVM make lld's --undefined-version check of lld is strict | ||
| # by default (https://reviews.llvm.org/D135402), but the tftext build seems to | ||
| # rely on this behavior. | ||
| echo "write_to_bazelrc \"build --linkopt='-Wl,--undefined-version'\"" >> oss_scripts/configure.sh | ||
|
|
||
| ./oss_scripts/run_build.sh | ||
| EOF | ||
|
|
||
| ############################################################################### | ||
| ## Download source and add auxiliary scripts | ||
| ############################################################################### | ||
|
|
||
| FROM ${BASE_IMAGE} as mealkit | ||
| ARG URLREF_MAXTEXT | ||
| ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#master | ||
| ARG SRC_PATH_MAXTEXT | ||
| ARG SRC_PATH_TFTEXT=/opt/tensorflow-text | ||
|
|
||
| # Preserve version information of tensorflow-text | ||
| COPY --from=tftext-builder ${SRC_PATH_TFTEXT}/tensorflow_text*.whl /opt/ | ||
| RUN echo "tensorflow-text @ file://$(ls /opt/tensorflow_text*.whl)" >> /opt/pip-tools.d/requirements-maxtext.in | ||
|
|
||
| RUN <<"EOF" bash -ex | ||
| git-clone.sh ${URLREF_MAXTEXT} ${SRC_PATH_MAXTEXT} | ||
| echo "-r ${SRC_PATH_MAXTEXT}/requirements.txt" >> /opt/pip-tools.d/requirements-maxtext.in | ||
|
|
||
| # specify some restrictions to speed up the build and | ||
| # avoid pip to download and check all available versions of packages | ||
| for pattern in \ | ||
| "s|absl-py|absl-py>=2.1.0|g" \ | ||
| "s|protobuf==3.20.3|protobuf>=3.19.0|g" \ | ||
| "s|tensorflow-datasets|tensorflow-datasets>=4.8.0|g" \ | ||
| ; do | ||
| sed -i "${pattern}" ${SRC_PATH_MAXTEXT}/requirements.txt; | ||
| done | ||
| echo "tensorflow-metadata>=1.15.0" >> ${SRC_PATH_MAXTEXT}/requirements.txt | ||
| EOF | ||
|
|
||
| ############################################################################### | ||
| ## Add test script to the path | ||
| ############################################################################### | ||
|
|
||
| ADD test-maxtext.sh /usr/local/bin | ||
|
|
||
| ############################################################################### | ||
| ## Install accumulated packages from the base image and the previous stage | ||
| ############################################################################### | ||
|
|
||
| FROM mealkit as final | ||
|
|
||
| RUN pip-finalize.sh | ||
|
|
||
| WORKDIR ${SRC_PATH_MAXTEXT} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -31,4 +31,4 @@ FROM mealkit as final | |
|
|
||
| RUN pip-finalize.sh | ||
|
|
||
| WORKDIR ${SRC_PATH_MAXTEXT} | ||
| WORKDIR ${SRC_PATH_MAXTEXT} | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,188 @@ | ||
| # syntax=docker/dockerfile:1-labs | ||
|
|
||
| ARG BASE_IMAGE=ghcr.io/nvidia/jax-mealkit:jax | ||
| ARG URLREF_PAXML=https://github.com/google/paxml.git#main | ||
| ARG URLREF_PRAXIS=https://github.com/google/praxis.git#main | ||
| ARG URLREF_TFTEXT=https://github.com/tensorflow/text.git#master | ||
yhtang marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ARG URLREF_LINGVO=https://github.com/tensorflow/lingvo.git#master | ||
| ARG SRC_PATH_PAXML=/opt/paxml | ||
| ARG SRC_PATH_PRAXIS=/opt/praxis | ||
| ARG SRC_PATH_TFTEXT=/opt/tensorflow-text | ||
| ARG SRC_PATH_LINGVO=/opt/lingvo | ||
|
|
||
| ############################################################################### | ||
| ## build tensorflow-text and lingvo, which do not have working arm64 pip wheels | ||
| ############################################################################### | ||
|
|
||
| ARG BASE_IMAGE | ||
| FROM ${BASE_IMAGE} as wheel-builder | ||
|
|
||
| #------------------------------------------------------------------------------ | ||
| # build tensorflow-text from source | ||
| #------------------------------------------------------------------------------ | ||
|
|
||
| # Remove TFTEXT build from source when it has py-3.12 wheels for x86/arm64 | ||
| FROM wheel-builder as tftext-builder | ||
| ARG URLREF_TFTEXT | ||
| ARG SRC_PATH_TFTEXT | ||
| RUN <<"EOF" bash -exu -o pipefail | ||
| pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.18.0 | ||
| git-clone.sh ${URLREF_TFTEXT} ${SRC_PATH_TFTEXT} | ||
| cd ${SRC_PATH_TFTEXT} | ||
|
|
||
| # The tftext build script queries GitHub, but these requests are sometimes | ||
| # throttled by GH, resulting in a corrupted uri for tensorflow in WORKSPACE. | ||
| # A workaround (needs to be updated when the tensorflow version changes): | ||
| sed -i "s/# Update TF dependency to installed tensorflow./commit_slug=6550e4bd80223cdb8be6c3afd1f81e86a4d433c3/" oss_scripts/prepare_tf_dep.sh | ||
|
|
||
| # Newer versions of LLVM make lld's --undefined-version check of lld is strict | ||
| # by default (https://reviews.llvm.org/D135402), but the tftext build seems to | ||
| # rely on this behavior. | ||
| echo "write_to_bazelrc \"build --linkopt='-Wl,--undefined-version'\"" >> oss_scripts/configure.sh | ||
|
|
||
| ./oss_scripts/run_build.sh | ||
| EOF | ||
|
|
||
| #------------------------------------------------------------------------------ | ||
| # build lingvo | ||
| #------------------------------------------------------------------------------ | ||
|
|
||
| # Remove Lingvo build from source when it has py-3.12 wheels for x86/arm64 | ||
| FROM wheel-builder as lingvo-builder | ||
| ARG URLREF_LINGVO | ||
| ARG SRC_PATH_TFTEXT | ||
| ARG SRC_PATH_LINGVO | ||
|
|
||
| # Preserve the version of tensorflow-text | ||
| COPY --from=tftext-builder /opt/manifest.d/git-clone.yaml /opt/manifest.d/git-clone.yaml | ||
| COPY --from=tftext-builder ${SRC_PATH_TFTEXT}/tensorflow_text*.whl /opt/ | ||
|
|
||
| ENV USE_BAZEL_VERSION=7.1.2 | ||
|
|
||
| # build lingvo | ||
| RUN <<"EOF" bash -exu -o pipefail | ||
| git-clone.sh ${URLREF_LINGVO} ${SRC_PATH_LINGVO} | ||
| pushd ${SRC_PATH_LINGVO} | ||
|
|
||
| CPU_ARCH="$(dpkg --print-architecture)" | ||
| if [[ "${CPU_ARCH}" == "arm64" ]]; then | ||
|
|
||
| # Use aarch distribution of protobufs | ||
| patch -p1 <<"EOFINNER" | ||
| diff --git a/lingvo/repo.bzl b/lingvo/repo.bzl | ||
| index ce65822d2..d9c0277aa 100644 | ||
| --- a/lingvo/repo.bzl | ||
| +++ b/lingvo/repo.bzl | ||
| @@ -232,9 +232,9 @@ filegroup( | ||
| ) | ||
| """, | ||
| urls = [ | ||
| - "https://github.com/protocolbuffers/protobuf/releases/download/v21.9/protoc-21.9-linux-x86_64.zip", | ||
| + "https://github.com/protocolbuffers/protobuf/releases/download/v21.9/protoc-21.9-linux-aarch_64.zip", | ||
| ], | ||
| - sha256 = "3cd951aff8ce713b94cde55e12378f505f2b89d47bf080508cf77e3934f680b6", | ||
| + sha256 = "a584286dfa8ebb17032ece206ed74d5e9931e2edb9016e427be2a0dab3b21071", | ||
| ) | ||
|
|
||
| def icu(): | ||
| EOFINNER | ||
|
|
||
| fi | ||
|
|
||
| pip install tensorflow_datasets==4.9.2 auditwheel tensorflow==2.18.0 /opt/tensorflow_text*.whl | ||
| for pattern in \ | ||
| "s|tensorflow=|#tensorflow=|g" \ | ||
| "s|tensorflow-text=|#tensorflow-text=|g" \ | ||
| "s|dataclasses=|#dataclasses=|g" \ | ||
| "s|==.*||g" \ | ||
| ; do | ||
| sed -i "${pattern}" ${SRC_PATH_LINGVO}/docker/dev.requirements.txt | ||
| done | ||
| # Lingvo support only python < 3.12, so we hack it and update dependencies | ||
| # to be able to build for py-3.12 | ||
| for pattern in \ | ||
| "s|tensorflow-text~=2.13.0|tensorflow-text~=2.18.0|g" \ | ||
| "s|tensorflow~=2.13.0|tensorflow~=2.18.0|g" \ | ||
| "s|python_requires='>=3.8,<3.11'|python_requires='>=3.8,<3.13'|" \ | ||
| ; do | ||
| sed -i "${pattern}" ${SRC_PATH_LINGVO}/pip_package/setup.py; | ||
| done | ||
| pip install -r docker/dev.requirements.txt | ||
|
|
||
| # Some tests are flaky right now, so we skip running the tests. | ||
| BUILD_ARCH="x86_64" | ||
| if [[ "$CPU_ARCH" == "arm64" ]]; then | ||
| BUILD_ARCH="aarch64"; | ||
| fi | ||
| sed -i 's/manylinux2014_x86_64/manylinux_2_38_'"${BUILD_ARCH}"'/' pip_package/build.sh | ||
| SKIP_TESTS=1 PYTHON_MINOR_VERSION=$(python --version | cut -d ' ' -f 2 | cut -d '.' -f 2) pip_package/build.sh | ||
| EOF | ||
|
|
||
| ############################################################################### | ||
| ## Pax for AArch64 | ||
| ############################################################################### | ||
|
|
||
| ARG BASE_IMAGE | ||
| FROM ${BASE_IMAGE} as mealkit | ||
| ARG URLREF_PAXML | ||
| ARG URLREF_PRAXIS | ||
| ARG SRC_PATH_PAXML | ||
| ARG SRC_PATH_PRAXIS | ||
| ARG SRC_PATH_TFTEXT | ||
|
|
||
| # Preserve version information of tensorflow-text and lingvo | ||
| COPY --from=lingvo-builder /opt/manifest.d/git-clone.yaml /opt/manifest.d/git-clone.yaml | ||
| COPY --from=lingvo-builder /tmp/lingvo/dist/lingvo*-linux*.whl /opt/ | ||
| RUN echo "lingvo @ file://$(ls /opt/lingvo*.whl)" >> /opt/pip-tools.d/requirements-paxml.in | ||
|
|
||
| COPY --from=tftext-builder ${SRC_PATH_TFTEXT}/tensorflow_text*.whl /opt/ | ||
| RUN echo "tensorflow-text @ file://$(ls /opt/tensorflow_text*.whl)" >> /opt/pip-tools.d/requirements-paxml.in | ||
|
|
||
| # paxml + praxis | ||
| RUN <<"EOF" bash -ex | ||
| echo "tensorflow_datasets==4.9.2" >> /opt/pip-tools.d/requirements-paxml.in | ||
| echo "auditwheel" >> /opt/pip-tools.d/requirements-paxml.in | ||
|
|
||
| git-clone.sh ${URLREF_PAXML} ${SRC_PATH_PAXML} | ||
| git-clone.sh ${URLREF_PRAXIS} ${SRC_PATH_PRAXIS} | ||
| echo "-e file://${SRC_PATH_PAXML}[gpu]" >> /opt/pip-tools.d/requirements-paxml.in | ||
| echo "-e file://${SRC_PATH_PRAXIS}" >> /opt/pip-tools.d/requirements-paxml.in | ||
|
|
||
| for src in ${SRC_PATH_PAXML} ${SRC_PATH_PRAXIS}; do | ||
| pushd ${src} | ||
|
|
||
| for pattern in \ | ||
| "s| @ git+https://github.com/google/flax||g" \ | ||
| "s| @ git+https://github.com/google/jax||g" \ | ||
| "s| @ git+https://github.com/google/fiddle||g" \ | ||
| "s|^tensorflow|#tensorflow|" \ | ||
| "s|^lingvo|#lingvo|" \ | ||
| "s|^scikit-learn|#scikit-learn|" \ | ||
| "s|^protobuf|#protobuf|" \ | ||
| "s|^numpy|#numpy|" \ | ||
| "s|^orbax-checkpoint|#orbax-checkpoint|" \ | ||
| "s| @ git+https://github.com/google/CommonLoopUtils||g" \ | ||
| ; do | ||
| sed -i "${pattern}" */pip_package/requirements.txt requirements.in | ||
| done | ||
|
|
||
| if git diff --quiet; then | ||
| echo "broken dependencies no longer present in ${src}" | ||
| exit 1 | ||
| else | ||
| git commit -a -m "remove broken dependencies from ${src}" | ||
| fi | ||
| popd | ||
| done | ||
| sed -i 's/pysimdjson==[0-9.]*/pysimdjson/' ${SRC_PATH_PAXML}/setup.py | ||
| EOF | ||
|
|
||
| ADD test-pax.sh /usr/local/bin | ||
|
|
||
| ############################################################################### | ||
| ## Install accumulated packages from the base image and the previous stage | ||
| ############################################################################### | ||
|
|
||
| FROM mealkit as final | ||
|
|
||
| RUN pip-finalize.sh | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.