-
Notifications
You must be signed in to change notification settings - Fork 258
Description
I have been trying for a while to create a docker container in which to run nanotron. So far I have not even been able to run the quickstart even though I am trying to follow as closely as possible the installation guide in the readme. This is the docker file I am using currently, after a lot of different experiments:
FROM nvidia/cuda:12.4.1-devel-ubuntu22.04
ARG DEBIAN_FRONTEND=noninteractive
# System deps
RUN apt-get update && apt-get install -y --no-install-recommends \
git git-lfs \
build-essential \
curl ca-certificates \
python3 python3-venv python3-pip python3.11-dev \
&& rm -rf /var/lib/apt/lists/*
# uv
RUN curl -LsSf https://astral.sh/uv/install.sh | sh && \
ln -s /root/.local/bin/uv /usr/local/bin/uv
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 1
ENV PIP_NO_CACHE_DIR=1
# Deps
RUN uv pip install --system torch --index-url https://download.pytorch.org/whl/cu124 && \
uv pip install --system datasets transformers "datatrove[io]" numba wandb && \
uv pip install --system ninja triton "flash_attn==2.7.4.post1" --no-build-isolation && \
uv pip install --system git+https://github.com/huggingface/nanotron.git@nanotron-working-branch
# Ensure git-lfs is set up (no interactive login)
RUN uv pip install --system psutil
WORKDIR /app
COPY . /app
ENV PYTHONPATH=/app:${PYTHONPATH}
CMD ["/bin/bash"]
I have tried several variations of this: leaving flash-attn>=2.5.0 as in the guide, fixing torch version to 2.4.x, I have tried the main branch, the smollm3 branch and the nanotron-working-branch.
I have overcome several errors with the different modifications and with this dockerfile I have gotten the furthest but I am currently stuck with this error while running run_train.py as explained in the readme with tiny llama config:
AttributeError: 'ModelArgs' object has no attribute 'model'
Is there something wrong in the docker container or is it something else?