Feat/update unsloth template#12
Conversation
There was a problem hiding this comment.
Pull request overview
Updates the Unsloth official template (and aligns parts of the PyTorch template) by changing the base image/runtime setup, package installs, and startup/entrypoint behavior.
Changes:
- Reworked Unsloth Dockerfile to build Python from source, install torch/cu128 + unsloth tooling, and add nginx/Jupyter configuration.
- Updated both Unsloth and PyTorch templates to use a new
start1.shentrypoint and adjusted environment defaults. - Modified image tagging in both templates’
docker-bake.hcl.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 12 comments.
| File | Description |
|---|---|
| official-templates/unsloth/Dockerfile | Major rebuild of the Unsloth image (Python install approach, packages, Jupyter/nginx wiring, new entrypoint). |
| official-templates/unsloth/docker-bake.hcl | Tag updated for Unsloth publish target. |
| official-templates/pytorch/Dockerfile | Updates startup expectations (new entrypoint), Jupyter config, and removes conda/uv sections. |
| official-templates/pytorch/docker-bake.hcl | Updates TAG_SUFFIX and changes tag format for PyTorch image publishing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| FROM scratch AS logo | ||
| COPY container-template/yotta.txt yotta.txt | ||
|
|
||
| FROM scratch AS scripts | ||
| COPY container-template/start.sh start.sh | ||
| COPY container-template/start1.sh start1.sh | ||
|
|
||
| FROM scratch AS proxy | ||
| COPY container-template/proxy/nginx.conf nginx.conf | ||
| COPY container-template/proxy/readme.html readme.html |
There was a problem hiding this comment.
The scratch stages copy files from container-template/..., but that directory is not part of the official-templates/unsloth build context (and docker-bake.hcl already provides named contexts for scripts/proxy/logo). As written, this will fail to build unless the build context is changed; use COPY --from=<named-context> (or set context appropriately) instead of COPY container-template/....
Also, container-template/start1.sh does not exist in the repo, so the build will fail even if the context issue is fixed.
| # main stage | ||
| # =============================== | ||
| FROM nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04 | ||
|
|
||
| ARG DEBIAN_FRONTEND=noninteractive | ||
| ENV TZ=UTC \ | ||
| # Re-declare ARGs after FROM | ||
| ARG PYTHON_VERSION="3.11.14" | ||
|
|
There was a problem hiding this comment.
docker-bake.hcl provides a BASE_IMAGE build arg for this template, but the Dockerfile hard-codes FROM nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04. This makes the bake arg ineffective and contradicts the template documentation that says it is built on unsloth/unsloth. Consider switching to ARG BASE_IMAGE + FROM ${BASE_IMAGE} (or removing the arg/docs if the base image is intentionally fixed).
| LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH \ | ||
| JUPYTER_PASSWORD=yotta |
There was a problem hiding this comment.
Setting JUPYTER_PASSWORD to a fixed default (yotta) changes the runtime behavior of the shared startup convention: Jupyter becomes enabled by default (in container-template/start.sh, Jupyter starts whenever JUPYTER_PASSWORD is set). This also makes every container share the same token unless users override it. Consider leaving it unset by default and documenting/passing it at runtime instead.
| LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH \ | |
| JUPYTER_PASSWORD=yotta | |
| LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH |
| 'c.ServerApp.token = "yotta"' \ | ||
| 'c.ServerApp.password = ""' \ | ||
| 'c.ServerApp.allow_remote_access = True' \ | ||
| 'c.ServerApp.allow_origin = "*"' \ | ||
| 'c.NotebookApp.token = "yotta"' \ | ||
| 'c.NotebookApp.password = ""' \ |
There was a problem hiding this comment.
The Jupyter config hard-codes a token (c.ServerApp.token = "yotta") and allows any origin. This prevents runtime configuration via JUPYTER_PASSWORD and increases exposure if the notebook server is reachable. Prefer sourcing the token from an env var at startup (or omit the config and let the startup script/CLI flags control auth/origin).
| 'c.ServerApp.token = "yotta"' \ | |
| 'c.ServerApp.password = ""' \ | |
| 'c.ServerApp.allow_remote_access = True' \ | |
| 'c.ServerApp.allow_origin = "*"' \ | |
| 'c.NotebookApp.token = "yotta"' \ | |
| 'c.NotebookApp.password = ""' \ | |
| 'c.ServerApp.allow_remote_access = True' \ |
| # =============================== | ||
| # Entrypoint: root runs start1.sh | ||
| # =============================== | ||
| USER root | ||
| WORKDIR /root | ||
| CMD ["/bin/bash", "-c", "exec /bin/bash /start1.sh"] |
There was a problem hiding this comment.
The image now runs /start1.sh, but the template docs (README) instruct users to run /start.sh, and the shared container-template only contains start.sh today. Either add/commit start1.sh (and update docs accordingly) or keep the entrypoint aligned with /start.sh to avoid breaking documented usage.
| @@ -22,13 +22,13 @@ ENV DEBIAN_FRONTEND=noninteractive \ | |||
| SHELL=/bin/bash \ | |||
| PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \ | |||
There was a problem hiding this comment.
This Dockerfile removed the Miniconda installation, but PATH still prepends /opt/conda/bin. With conda no longer present, this is misleading and can mask issues (e.g., users expecting conda to exist). Consider removing /opt/conda/bin from PATH (or reintroducing conda if it’s still required).
| PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \ | |
| PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \ |
| # Configure JupyterLab: auto-login with token (no password prompt) | ||
| # =============================== | ||
| RUN mkdir -p /root/.jupyter && printf '%s\n' \ | ||
| 'c.ServerApp.token = "yotta"' \ | ||
| 'c.ServerApp.password = ""' \ | ||
| 'c.ServerApp.allow_remote_access = True' \ | ||
| 'c.ServerApp.allow_origin = "*"' \ | ||
| 'c.NotebookApp.token = "yotta"' \ |
There was a problem hiding this comment.
The Jupyter config hard-codes a token ("yotta") and allows any origin. This prevents controlling auth via the documented JUPYTER_PASSWORD env var and increases exposure if the notebook server is reachable. Prefer sourcing the token from an env var at startup (or omit this file and rely on the startup script/CLI flags).
| COPY --from=scripts start1.sh /start1.sh | ||
| RUN chmod 755 /start1.sh && \ | ||
| sed -i 's/\r$//' /start1.sh |
There was a problem hiding this comment.
COPY --from=scripts start1.sh /start1.sh references start1.sh, but the shared container-template/ currently only contains start.sh (no start1.sh). This will fail at build time unless start1.sh is added to the scripts context or the Dockerfile is switched back to /start.sh.
|
|
||
| tags = [ | ||
| "${PUBLISHER}/pytorch:2.9.0-py3.11-cuda12.8.1-cudnn-devel-ubuntu22.04" | ||
| "${PUBLISHER}/pytorch:${TAG_SUFFIX}" |
There was a problem hiding this comment.
The image tag was changed to ${PUBLISHER}/pytorch:${TAG_SUFFIX}, which drops the descriptive version components (torch/cuda/python/ubuntu) used elsewhere in this repo’s template tags (e.g., ...:cuda12.8.1-ubuntu22.04-${TAG_SUFFIX}). Consider keeping the descriptive prefix and appending the date suffix to preserve discoverability and avoid ambiguity.
| "${PUBLISHER}/pytorch:${TAG_SUFFIX}" | |
| "${PUBLISHER}/pytorch:cuda12.8.1-ubuntu22.04-${TAG_SUFFIX}" |
|
|
||
| args = { | ||
| BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04" | ||
| BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04" |
There was a problem hiding this comment.
There are trailing spaces after the BASE_IMAGE value. This is minor, but it creates noisy diffs and can fail stricter formatting checks—please trim the whitespace.
| BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04" | |
| BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04" |
|
PR Hygiene COPY . /build_context removed but no replacement for local notebooks — The old Dockerfile had a (clunky) mechanism to overlay local unsloth-notebooks/ from the build context. That's gone, but the git clone of the upstream notebooks repo remains. If the team had local notebook customizations in the build context, those are now silently dropped. SSH config not configured — Same issue across all PRs. The old sshd wrapper and user setup are removed. start1.sh calls service ssh start, but without PermitRootLogin yes in sshd_config, root SSH won't work. The old image ran SSH via a custom wrapper that stripped -D; that's also gone. vLLM added without documentation or exposed port — Dockerfile:93 installs vllm but the template has no README mention of it, no vLLM startup logic, and no port exposed for it (8000). If this is intentional (library use only), it's a very heavy dependency (~2GB+ in the image) for import-only usage. If it's meant to be served, it needs an entrypoint hook. Medium Unpinned notebook clone — Dockerfile:118. git clone --depth=1 of HEAD from unslothai/notebooks. Not reproducible. chmod 777 /workspace /root — Dockerfile:42. Root's home directory world-writable. Same issue in the "Supervisor dirs" block at line ~133 with chmod 777 /var/log/supervisor /workspace /var/run /var/lib/nginx. Duplicate directory creation — /var/run/sshd, /var/log/supervisor, and /workspace are created/chmod'd in multiple RUN layers (apt block, workspace block, supervisor block). --enable-optimizations removed from Python build — Recurring issue. Hardcoded Jupyter token "yotta" — Recurring issue across all templates. sed -i 's/\r$//' on start1.sh — Fix line endings at source. Low Build-time assertion doesn't import unsloth — Dockerfile:103. The comment says "unsloth requires GPU at import time" so it only checks importlib.metadata.version. Fair enough, but consider python -c "import unsloth" with a || true fallback so you at least see if it explodes at build time. Old entry.sh with Jupyter-as-PID1 removed — Good cleanup. But note that the old image ran Jupyter directly via entry.sh with --ServerApp.token="" (no auth). The new image uses start1.sh which sets --ServerApp.token=$JUPYTER_PASSWORD. This is a behavior change — existing users who expected no-auth Jupyter will now need the token. Worth documenting. Pytorch Dockerfile — Recurring Issues Positive Changes tini is gone — zombie processes won't be reaped |
Summary
Type of Change
container-template/)Checklist
Dockerfilebuilds successfully (docker buildx bake <target> --no-cache)README.mdupdated to reflect any interface or version changesdocker-bake.hcltarget name and image tag are correctTest Notes