Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 34 additions & 64 deletions official-templates/pytorch/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ ENV DEBIAN_FRONTEND=noninteractive \
SHELL=/bin/bash \
PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Dockerfile removed the Miniconda installation, but PATH still prepends /opt/conda/bin. With conda no longer present, this is misleading and can mask issues (e.g., users expecting conda to exist). Consider removing /opt/conda/bin from PATH (or reintroducing conda if it’s still required).

Suggested change
PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \

Copilot uses AI. Check for mistakes.
LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH \
JUPYTER_PASSWORD=ubuntu
JUPYTER_PASSWORD=yotta

# ===============================
# Workspace
# ===============================
WORKDIR /
RUN mkdir -p /workspace && chmod 777 /workspace
RUN mkdir -p /workspace && chmod 777 /workspace /root

# ===============================
# Base system packages
Expand All @@ -44,46 +44,16 @@ RUN apt-get update -y && \
build-essential pkg-config \
&& echo "en_US.UTF-8 UTF-8" > /etc/locale.gen \
&& locale-gen \
&& mkdir -p /var/run/sshd \
&& mkdir -p /var/run/sshd /var/log/supervisor \
&& chmod 700 /var/run/sshd /var/log/supervisor \
&& chmod 755 /var/log \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# ===============================
# uv (Astral) - Python package manager
# - Install to /usr/local/bin
# - Avoids modifying shell profile (suitable for container/CI)
# Remove ubuntu user (for security: prevent unauthorized SSH access)
# ===============================
ARG UV_VERSION="latest"
RUN set -eux; \
if [ "${UV_VERSION}" = "latest" ]; then \
curl -LsSf https://astral.sh/uv/install.sh | env UV_UNMANAGED_INSTALL="/usr/local/bin" sh; \
else \
curl -LsSf "https://astral.sh/uv/${UV_VERSION}/install.sh" | env UV_UNMANAGED_INSTALL="/usr/local/bin" sh; \
fi; \
uv --version

# ===============================
# Miniconda
# ===============================
ARG MINICONDA_VERSION="py311_24.1.2-0"
ARG CONDA_DIR="/opt/conda"

RUN set -eux; \
ARCH="$(uname -m)"; \
case "${ARCH}" in \
x86_64) MINICONDA_ARCH="x86_64" ;; \
aarch64) MINICONDA_ARCH="aarch64" ;; \
*) echo "Unsupported arch: ${ARCH}" && exit 1 ;; \
esac; \
curl -fsSL \
"https://repo.anaconda.com/miniconda/Miniconda3-${MINICONDA_VERSION}-Linux-${MINICONDA_ARCH}.sh" \
-o /tmp/miniconda.sh; \
bash /tmp/miniconda.sh -b -p "${CONDA_DIR}"; \
rm -f /tmp/miniconda.sh; \
"${CONDA_DIR}/bin/conda" config --system --set auto_activate_base false; \
"${CONDA_DIR}/bin/conda" clean -afy

RUN ln -sf /opt/conda/bin/conda /usr/local/bin/conda
RUN userdel -r ubuntu || true

# ===============================
# Python 3.11 (build from source, with ensurepip)
Expand All @@ -101,7 +71,7 @@ RUN set -eux; \
&& tar -xzf /tmp/Python.tgz -C /tmp/python-src --strip-components=1 \
&& rm -f /tmp/Python.tgz \
&& cd /tmp/python-src \
&& ./configure --enable-optimizations --with-ensurepip=install \
&& ./configure --with-ensurepip=install \
&& make -j"$(nproc)" \
&& make altinstall \
&& cd / \
Expand All @@ -125,15 +95,23 @@ RUN python -m pip install --no-cache-dir \
huggingface-hub datasets

# ===============================
# Patch: ensure python3.11 has Jupyter (required by /start.sh)
# Only adds jupyter to the python3.11 environment, does not modify the existing pip install logic
# Build-time assertion: verify Jupyter installation
# ===============================
RUN /usr/local/bin/python3.11 -m ensurepip --upgrade && \
/usr/local/bin/python3.11 -m pip install --no-cache-dir \
jupyterlab ipywidgets jupyter-archive notebook==7.3.3
RUN python -c "import jupyter; import notebook; import jupyterlab; print('jupyter ok')"

# Build-time assertion: prevents pushing a broken image
RUN /usr/local/bin/python3.11 -c "import jupyter; import notebook; import jupyterlab; print('python3.11 jupyter ok')"
# ===============================
# Configure JupyterLab: auto-login with token (no password prompt)
# ===============================
RUN mkdir -p /root/.jupyter && printf '%s\n' \
'c.ServerApp.token = "yotta"' \
'c.ServerApp.password = ""' \
'c.ServerApp.allow_remote_access = True' \
'c.ServerApp.allow_origin = "*"' \
'c.NotebookApp.token = "yotta"' \
Comment on lines +103 to +110
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Jupyter config hard-codes a token ("yotta") and allows any origin. This prevents controlling auth via the documented JUPYTER_PASSWORD env var and increases exposure if the notebook server is reachable. Prefer sourcing the token from an env var at startup (or omit this file and rely on the startup script/CLI flags).

Copilot uses AI. Check for mistakes.
'c.NotebookApp.password = ""' \
'c.NotebookApp.allow_remote_access = True' \
> /root/.jupyter/jupyter_lab_config.py && \
chmod 600 /root/.jupyter/jupyter_lab_config.py

# ===============================
# NCCL tests (build from source, force MPI=0 to avoid mpi.h missing)
Expand All @@ -145,31 +123,23 @@ RUN set -eux; \
ln -sf /opt/nccl-tests/build/* /usr/local/bin/; \
rm -rf /opt/nccl-tests/.git

# ===============================
# User
# ===============================
RUN useradd -ms /bin/bash ubuntu && \
usermod -aG sudo ubuntu && \
echo "ubuntu ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/ubuntu && \
echo "ubuntu:ubuntu" | chpasswd

# ===============================
# SSH config (start.sh handles sshd startup; this ensures password login is enabled)
# ===============================
RUN sed -i 's/#PasswordAuthentication yes/PasswordAuthentication yes/' /etc/ssh/sshd_config && \
sed -i 's/PermitRootLogin prohibit-password/PermitRootLogin yes/' /etc/ssh/sshd_config && \
rm -f /etc/ssh/ssh_host_*

# ===============================
# CUDA bin convenience
# ===============================
RUN ln -sf /usr/local/cuda/bin/* /usr/bin/ || true

# ===============================
# start.sh (from buildx bake context "scripts")
# Supervisor configuration
# ===============================
COPY --from=scripts start.sh /start.sh
RUN chmod 755 /start.sh
RUN mkdir -p /var/log/supervisor /usr/local/bin && \
chmod 777 /var/log/supervisor /workspace /var/run /var/lib/nginx && \
mkdir -p /run/sshd && \
chmod 700 /run/sshd

COPY --from=scripts start1.sh /start1.sh
RUN chmod 755 /start1.sh && \
sed -i 's/\r$//' /start1.sh
Comment on lines +140 to +142
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COPY --from=scripts start1.sh /start1.sh references start1.sh, but the shared container-template/ currently only contains start.sh (no start1.sh). This will fail at build time unless start1.sh is added to the scripts context or the Dockerfile is switched back to /start.sh.

Copilot uses AI. Check for mistakes.

# ===============================
# nginx / branding
Expand All @@ -187,8 +157,8 @@ RUN echo 'cat /etc/yotta.txt' >> /root/.bashrc
EXPOSE 22 80 8888

# ===============================
# Entrypoint: root runs start.sh directly (does not modify the shared start.sh)
# Entrypoint: root runs start1.sh with explicit bash (ensures bash syntax works)
# ===============================
USER root
WORKDIR /root
CMD ["/bin/bash", "-lc", "exec /start.sh"]
CMD ["/bin/bash", "-c", "exec /bin/bash /start1.sh"]
6 changes: 3 additions & 3 deletions official-templates/pytorch/docker-bake.hcl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
variable "PUBLISHER" { default = "yottalabsai" }
variable "TAG_SUFFIX" { default = "2026010901" }
variable "TAG_SUFFIX" { default = "2026031701" }

group "default" {
targets = ["pytorch290"]
Expand All @@ -15,7 +15,7 @@ target "pytorch290" {
dockerfile = "Dockerfile"

tags = [
"${PUBLISHER}/pytorch:2.9.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04"
"${PUBLISHER}/pytorch:${TAG_SUFFIX}"
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image tag was changed to ${PUBLISHER}/pytorch:${TAG_SUFFIX}, which drops the descriptive version components (torch/cuda/python/ubuntu) used elsewhere in this repo’s template tags (e.g., ...:cuda12.8.1-ubuntu22.04-${TAG_SUFFIX}). Consider keeping the descriptive prefix and appending the date suffix to preserve discoverability and avoid ambiguity.

Suggested change
"${PUBLISHER}/pytorch:${TAG_SUFFIX}"
"${PUBLISHER}/pytorch:cuda12.8.1-ubuntu22.04-${TAG_SUFFIX}"

Copilot uses AI. Check for mistakes.
]

contexts = {
Expand All @@ -25,7 +25,7 @@ target "pytorch290" {
}

args = {
BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04"
BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04"
Copy link

Copilot AI Apr 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are trailing spaces after the BASE_IMAGE value. This is minor, but it creates noisy diffs and can fail stricter formatting checks—please trim the whitespace.

Suggested change
BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04"
BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04"

Copilot uses AI. Check for mistakes.
PYTHON_VERSION = "3.11.14"
TORCH = "torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128"
}
Expand Down
Loading
Loading