Feat/add hermes template#9
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new Hermes Agent official template (vLLM OpenAI-compatible server + hermes-agent CLI) and also updates the existing PyTorch official template’s runtime/entrypoint and image tagging.
Changes:
- Added a new
official-templates/hermestemplate with Dockerfile, bake target, and usage documentation. - Updated
official-templates/pytorchto change Jupyter defaults/config and switch to a new start script (start1.sh), plus adjusted bake tagging. - Adjusted security-related aspects in PyTorch (removing the
ubuntuuser) and added additional runtime directory permission setup.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| official-templates/pytorch/Dockerfile | Changes entrypoint/start script, Jupyter config defaults, user handling, and permissions. |
| official-templates/pytorch/docker-bake.hcl | Updates tag strategy to use TAG_SUFFIX and bumps the suffix. |
| official-templates/hermes/Dockerfile | New image definition: installs PyTorch nightly + vLLM + hermes-agent; adds SSH/Jupyter and startup scripts. |
| official-templates/hermes/docker-bake.hcl | New build target/tags; introduces variables for versioning/model selection. |
| official-templates/hermes/README.md | New template documentation for deployment, env vars, endpoints, and troubleshooting. |
Comments suppressed due to low confidence (1)
official-templates/pytorch/docker-bake.hcl:6
- PR description marks this as “New template” only, but this PR also changes the existing
official-templates/pytorchimage (Dockerfile + bake tags/entrypoint). Please update the PR description/type-of-change checklist to reflect the pytorch modifications, or split the pytorch changes into a separate PR to keep review/release scope clear.
variable "PUBLISHER" { default = "yottalabsai" }
variable "TAG_SUFFIX" { default = "2026031701" }
group "default" {
targets = ["pytorch290"]
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| COPY --from=scripts start1.sh /start1.sh | ||
| RUN chmod 755 /start1.sh && \ | ||
| sed -i 's/\r$//' /start1.sh |
There was a problem hiding this comment.
COPY --from=scripts start1.sh /start1.sh will fail because the referenced build context (../../container-template) only contains start.sh (no start1.sh). This makes the image non-buildable with the current bake contexts. Either add start1.sh to container-template/ (and ensure it’s included in the scripts context) or switch back to copying start.sh (and update it to not assume an ubuntu user).
| # =============================== | ||
| WORKDIR / | ||
| RUN mkdir -p /workspace && chmod 777 /workspace | ||
| RUN mkdir -p /workspace && chmod 777 /workspace /root |
There was a problem hiding this comment.
chmod 777 /root makes the root home directory world-writable, which is a significant security risk and can also cause OpenSSH StrictModes checks to reject key-based logins. Limit permissions to the specific writable paths needed (e.g., /workspace, /root/.jupyter, etc.) and keep /root non-writable by group/others.
| RUN mkdir -p /workspace && chmod 777 /workspace /root | |
| RUN mkdir -p /workspace && chmod 777 /workspace && chmod 700 /root |
| @@ -22,13 +22,13 @@ ENV DEBIAN_FRONTEND=noninteractive \ | |||
| SHELL=/bin/bash \ | |||
| PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \ | |||
There was a problem hiding this comment.
The image no longer installs Miniconda, but PATH still prepends /opt/conda/bin. This makes the environment misleading and can mask issues if a different python/tools are expected from conda. Consider removing /opt/conda/bin from PATH (or reintroduce conda if it’s still required).
| PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \ | |
| PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \ |
| RUN mkdir -p /var/log/supervisor /usr/local/bin && \ | ||
| chmod 777 /var/log/supervisor /workspace /var/run /var/lib/nginx && \ | ||
| mkdir -p /run/sshd && \ |
There was a problem hiding this comment.
chmod 777 on /var/run and /var/lib/nginx is overly permissive (world-writable runtime/state directories). Since the container runs as root, this shouldn’t be necessary and increases the blast radius if any process is compromised. Prefer the minimum permissions needed (e.g., only chown/chmod the specific pid/log dirs that must be writable).
| RUN mkdir -p /var/log/supervisor /usr/local/bin && \ | |
| chmod 777 /var/log/supervisor /workspace /var/run /var/lib/nginx && \ | |
| mkdir -p /run/sshd && \ | |
| RUN mkdir -p /var/log/supervisor /var/lib/nginx /workspace /usr/local/bin /run/sshd && \ | |
| chmod 755 /var/log/supervisor /var/lib/nginx && \ | |
| chmod 777 /workspace && \ |
|
|
||
| args = { | ||
| BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04" | ||
| BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04" |
There was a problem hiding this comment.
There’s trailing whitespace after the BASE_IMAGE value, which can cause noisy diffs and lint failures in some setups. Please remove the extra spaces at the end of the line.
| BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04" | |
| BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04" |
| printenv | grep -E '^YOTTA_|^PATH=|^_=' | awk -F = '{ print "export " $1 "=\"" $2 "\"" }' >> /etc/rp_environment | ||
| echo 'export PATH=/usr/local/nvidia/bin:/usr/local/cuda-12.8/bin:~/.local/bin:$PATH' >> /etc/rp_environment | ||
| echo 'source /etc/rp_environment' >> ~/.bashrc |
There was a problem hiding this comment.
export_env_vars appends (>>) to /etc/rp_environment and ~/.bashrc every container start, so restarts will continually duplicate exports/source lines. Make this idempotent by overwriting /etc/rp_environment (use >), and only adding the source line to ~/.bashrc if it isn’t already present.
| printenv | grep -E '^YOTTA_|^PATH=|^_=' | awk -F = '{ print "export " $1 "=\"" $2 "\"" }' >> /etc/rp_environment | |
| echo 'export PATH=/usr/local/nvidia/bin:/usr/local/cuda-12.8/bin:~/.local/bin:$PATH' >> /etc/rp_environment | |
| echo 'source /etc/rp_environment' >> ~/.bashrc | |
| printenv | grep -E '^YOTTA_|^PATH=|^_=' | awk -F = '{ print "export " $1 "=\"" $2 "\"" }' > /etc/rp_environment | |
| echo 'export PATH=/usr/local/nvidia/bin:/usr/local/cuda-12.8/bin:~/.local/bin:$PATH' >> /etc/rp_environment | |
| grep -qxF 'source /etc/rp_environment' ~/.bashrc 2>/dev/null || echo 'source /etc/rp_environment' >> ~/.bashrc |
| RUN sed -i "s/#PasswordAuthentication yes/PasswordAuthentication yes/" /etc/ssh/sshd_config && \ | ||
| sed -i "s/PasswordAuthentication no/PasswordAuthentication yes/" /etc/ssh/sshd_config && \ | ||
| sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config |
There was a problem hiding this comment.
The SSH hardening here enables PasswordAuthentication yes and PermitRootLogin yes. Since the README describes SSH access via PUBLIC_KEY, enabling password auth/root login is unnecessary and increases exposure (brute force / misconfiguration risk). Prefer key-only auth (PasswordAuthentication no) and a stricter PermitRootLogin setting (e.g., prohibit-password), unless there’s a documented requirement for password login.
| RUN sed -i "s/#PasswordAuthentication yes/PasswordAuthentication yes/" /etc/ssh/sshd_config && \ | |
| sed -i "s/PasswordAuthentication no/PasswordAuthentication yes/" /etc/ssh/sshd_config && \ | |
| sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config | |
| RUN sed -i "s/#PasswordAuthentication yes/PasswordAuthentication no/" /etc/ssh/sshd_config && \ | |
| sed -i "s/PasswordAuthentication yes/PasswordAuthentication no/" /etc/ssh/sshd_config && \ | |
| sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin prohibit-password/" /etc/ssh/sshd_config |
| RUN set -eux; \ | ||
| git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /tmp/hermes-agent; \ | ||
| cd /tmp/hermes-agent; \ | ||
| pip install --no-cache-dir . ; \ | ||
| mkdir -p "$HERMES_HOME"; \ | ||
| chmod 777 "$HERMES_HOME"; \ | ||
| rm -rf /tmp/hermes-agent |
There was a problem hiding this comment.
hermes-agent is installed from a shallow clone of the repo’s default branch, which makes builds non-reproducible and can break unexpectedly if upstream changes. Consider pinning to a specific tag/commit (or a published PyPI version if available) and using that in the build so images are repeatable.
| RUN mkdir -p /root/.jupyter && printf '%s\n' \ | ||
| 'c.ServerApp.token = "yotta"' \ | ||
| 'c.ServerApp.password = ""' \ | ||
| 'c.ServerApp.allow_remote_access = True' \ | ||
| 'c.ServerApp.allow_origin = "*"' \ | ||
| 'c.NotebookApp.token = "yotta"' \ | ||
| 'c.NotebookApp.password = ""' \ | ||
| 'c.NotebookApp.allow_remote_access = True' \ |
There was a problem hiding this comment.
Jupyter is configured for remote access with a fixed, well-known token ("yotta") and an empty password. If this container can be reached from untrusted networks, this is easy to guess and increases risk of unauthorized access. Consider generating a random token when JUPYTER_PASSWORD isn’t explicitly set (or disabling Jupyter by default) and avoid allow_origin="*" unless required.
| # Configure JupyterLab: auto-login with token (no password prompt) | ||
| # =============================== | ||
| RUN mkdir -p /root/.jupyter && printf '%s\n' \ | ||
| 'c.ServerApp.token = "yotta"' \ | ||
| 'c.ServerApp.password = ""' \ | ||
| 'c.ServerApp.allow_remote_access = True' \ | ||
| 'c.ServerApp.allow_origin = "*"' \ | ||
| 'c.NotebookApp.token = "yotta"' \ |
There was a problem hiding this comment.
Jupyter is configured for remote access with a fixed, well-known token ("yotta") and an empty password. If the container is reachable outside a trusted boundary, this makes unauthorized access much easier. Consider requiring an explicit JUPYTER_PASSWORD/token at runtime (no default) or generating a random token on first start, and avoid allow_origin="*" unless it’s required.
dadaism
left a comment
There was a problem hiding this comment.
The hermes template is a solid foundation — the README is thorough and the vLLM startup/health-check logic is well done.
Main concerns are:
- the VLLM_MAX_MODEL_LEN inconsistency between Dockerfile and README
- world-readable secrets (chmod 777)
- unpinned dependencies (hermes-agent, vllm)
- the unused VLLM_VERSION variable
- the hadolint bypass.
The pytorch changes should be split into their own PR.
| CMAKE_CUDA_ARCHITECTURES="120" \ | ||
| PYTORCH_ALLOC_CONF="expandable_segments:True" \ | ||
| PYTHON_VERSION="3.11" \ | ||
| JUPYTER_PASSWORD=yotta \ |
There was a problem hiding this comment.
Is it possible to generate random secrets and don't use hardcoded one?
| CUDA_HOME=/usr/local/cuda \ | ||
| PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \ | ||
| LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH \ | ||
| TORCH_CUDA_ARCH_LIST="12.0" \ |
There was a problem hiding this comment.
Only for Blackwell? Is it possible to support "8.0;9.0;12.0" as well
| cd /tmp/hermes-agent; \ | ||
| pip install --no-cache-dir . ; \ | ||
| mkdir -p "$HERMES_HOME"; \ | ||
| chmod 777 "$HERMES_HOME"; \ |
| VLLM_LOG="/workspace/vllm.log" \ | ||
| OPENAI_BASE_URL=http://localhost:8000/v1 | ||
|
|
||
| RUN mkdir -p /workspace "$HF_HOME" && chmod -R 777 /workspace "$HF_HOME" |
| ENV HERMES_HOME=/root/.hermes | ||
|
|
||
| RUN set -eux; \ | ||
| git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /tmp/hermes-agent; \ |
There was a problem hiding this comment.
Pin to a specific commit or release tag?
| # vLLM ecosystem (latest stable — same pattern as OpenClaw) | ||
| # ============================== | ||
| RUN python -m pip install --no-cache-dir \ | ||
| vllm transformers sentencepiece "huggingface_hub[cli]" hf_transfer einops \ |
There was a problem hiding this comment.
vllm==${VLLM_VERSION}
lock the version?
| && rm -f /tmp/Python.tgz \ | ||
| && cd /tmp/python-src \ | ||
| && ./configure --enable-optimizations --with-ensurepip=install \ | ||
| && ./configure --with-ensurepip=install \ |
There was a problem hiding this comment.
Please add --enable-optimizations back
Summary
Type of Change
container-template/)Checklist
Dockerfilebuilds successfully (docker buildx bake <target> --no-cache)README.mdupdated to reflect any interface or version changesdocker-bake.hcltarget name and image tag are correctTest Notes
I uploaded it to my personal repo on dockerhub and launched a pod to start its service . Everything went well.