Skip to content

Feat/add hermes template#9

Open
AanningZoeng wants to merge 4 commits intomainfrom
feat/add-hermes-template
Open

Feat/add hermes template#9
AanningZoeng wants to merge 4 commits intomainfrom
feat/add-hermes-template

Conversation

@AanningZoeng
Copy link
Copy Markdown
Collaborator

Summary

Type of Change

  • New template
  • Update to existing template (version bump, new packages, etc.)
  • Bug fix
  • Shared infrastructure change (container-template/)
  • Documentation only

Checklist

  • Dockerfile builds successfully (docker buildx bake <target> --no-cache)
  • Container starts and all services are reachable
  • README.md updated to reflect any interface or version changes
  • No secrets, tokens, or internal hostnames committed
  • docker-bake.hcl target name and image tag are correct

Test Notes

I uploaded it to my personal repo on dockerhub and launched a pod to start its service . Everything went well.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new Hermes Agent official template (vLLM OpenAI-compatible server + hermes-agent CLI) and also updates the existing PyTorch official template’s runtime/entrypoint and image tagging.

Changes:

  • Added a new official-templates/hermes template with Dockerfile, bake target, and usage documentation.
  • Updated official-templates/pytorch to change Jupyter defaults/config and switch to a new start script (start1.sh), plus adjusted bake tagging.
  • Adjusted security-related aspects in PyTorch (removing the ubuntu user) and added additional runtime directory permission setup.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
official-templates/pytorch/Dockerfile Changes entrypoint/start script, Jupyter config defaults, user handling, and permissions.
official-templates/pytorch/docker-bake.hcl Updates tag strategy to use TAG_SUFFIX and bumps the suffix.
official-templates/hermes/Dockerfile New image definition: installs PyTorch nightly + vLLM + hermes-agent; adds SSH/Jupyter and startup scripts.
official-templates/hermes/docker-bake.hcl New build target/tags; introduces variables for versioning/model selection.
official-templates/hermes/README.md New template documentation for deployment, env vars, endpoints, and troubleshooting.
Comments suppressed due to low confidence (1)

official-templates/pytorch/docker-bake.hcl:6

  • PR description marks this as “New template” only, but this PR also changes the existing official-templates/pytorch image (Dockerfile + bake tags/entrypoint). Please update the PR description/type-of-change checklist to reflect the pytorch modifications, or split the pytorch changes into a separate PR to keep review/release scope clear.
variable "PUBLISHER"  { default = "yottalabsai" }
variable "TAG_SUFFIX" { default = "2026031701" }

group "default" {
  targets = ["pytorch290"]
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +140 to +142
COPY --from=scripts start1.sh /start1.sh
RUN chmod 755 /start1.sh && \
sed -i 's/\r$//' /start1.sh
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COPY --from=scripts start1.sh /start1.sh will fail because the referenced build context (../../container-template) only contains start.sh (no start1.sh). This makes the image non-buildable with the current bake contexts. Either add start1.sh to container-template/ (and ensure it’s included in the scripts context) or switch back to copying start.sh (and update it to not assume an ubuntu user).

Copilot uses AI. Check for mistakes.
# ===============================
WORKDIR /
RUN mkdir -p /workspace && chmod 777 /workspace
RUN mkdir -p /workspace && chmod 777 /workspace /root
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chmod 777 /root makes the root home directory world-writable, which is a significant security risk and can also cause OpenSSH StrictModes checks to reject key-based logins. Limit permissions to the specific writable paths needed (e.g., /workspace, /root/.jupyter, etc.) and keep /root non-writable by group/others.

Suggested change
RUN mkdir -p /workspace && chmod 777 /workspace /root
RUN mkdir -p /workspace && chmod 777 /workspace && chmod 700 /root

Copilot uses AI. Check for mistakes.
@@ -22,13 +22,13 @@ ENV DEBIAN_FRONTEND=noninteractive \
SHELL=/bin/bash \
PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image no longer installs Miniconda, but PATH still prepends /opt/conda/bin. This makes the environment misleading and can mask issues if a different python/tools are expected from conda. Consider removing /opt/conda/bin from PATH (or reintroduce conda if it’s still required).

Suggested change
PATH=/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \

Copilot uses AI. Check for mistakes.
Comment on lines +135 to +137
RUN mkdir -p /var/log/supervisor /usr/local/bin && \
chmod 777 /var/log/supervisor /workspace /var/run /var/lib/nginx && \
mkdir -p /run/sshd && \
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chmod 777 on /var/run and /var/lib/nginx is overly permissive (world-writable runtime/state directories). Since the container runs as root, this shouldn’t be necessary and increases the blast radius if any process is compromised. Prefer the minimum permissions needed (e.g., only chown/chmod the specific pid/log dirs that must be writable).

Suggested change
RUN mkdir -p /var/log/supervisor /usr/local/bin && \
chmod 777 /var/log/supervisor /workspace /var/run /var/lib/nginx && \
mkdir -p /run/sshd && \
RUN mkdir -p /var/log/supervisor /var/lib/nginx /workspace /usr/local/bin /run/sshd && \
chmod 755 /var/log/supervisor /var/lib/nginx && \
chmod 777 /workspace && \

Copilot uses AI. Check for mistakes.

args = {
BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04"
BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04"
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There’s trailing whitespace after the BASE_IMAGE value, which can cause noisy diffs and lint failures in some setups. Please remove the extra spaces at the end of the line.

Suggested change
BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04"
BASE_IMAGE = "nvidia/cuda:12.8.1-cudnn-devel-ubuntu22.04"

Copilot uses AI. Check for mistakes.
Comment on lines +222 to +224
printenv | grep -E '^YOTTA_|^PATH=|^_=' | awk -F = '{ print "export " $1 "=\"" $2 "\"" }' >> /etc/rp_environment
echo 'export PATH=/usr/local/nvidia/bin:/usr/local/cuda-12.8/bin:~/.local/bin:$PATH' >> /etc/rp_environment
echo 'source /etc/rp_environment' >> ~/.bashrc
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export_env_vars appends (>>) to /etc/rp_environment and ~/.bashrc every container start, so restarts will continually duplicate exports/source lines. Make this idempotent by overwriting /etc/rp_environment (use >), and only adding the source line to ~/.bashrc if it isn’t already present.

Suggested change
printenv | grep -E '^YOTTA_|^PATH=|^_=' | awk -F = '{ print "export " $1 "=\"" $2 "\"" }' >> /etc/rp_environment
echo 'export PATH=/usr/local/nvidia/bin:/usr/local/cuda-12.8/bin:~/.local/bin:$PATH' >> /etc/rp_environment
echo 'source /etc/rp_environment' >> ~/.bashrc
printenv | grep -E '^YOTTA_|^PATH=|^_=' | awk -F = '{ print "export " $1 "=\"" $2 "\"" }' > /etc/rp_environment
echo 'export PATH=/usr/local/nvidia/bin:/usr/local/cuda-12.8/bin:~/.local/bin:$PATH' >> /etc/rp_environment
grep -qxF 'source /etc/rp_environment' ~/.bashrc 2>/dev/null || echo 'source /etc/rp_environment' >> ~/.bashrc

Copilot uses AI. Check for mistakes.
Comment on lines +85 to +87
RUN sed -i "s/#PasswordAuthentication yes/PasswordAuthentication yes/" /etc/ssh/sshd_config && \
sed -i "s/PasswordAuthentication no/PasswordAuthentication yes/" /etc/ssh/sshd_config && \
sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SSH hardening here enables PasswordAuthentication yes and PermitRootLogin yes. Since the README describes SSH access via PUBLIC_KEY, enabling password auth/root login is unnecessary and increases exposure (brute force / misconfiguration risk). Prefer key-only auth (PasswordAuthentication no) and a stricter PermitRootLogin setting (e.g., prohibit-password), unless there’s a documented requirement for password login.

Suggested change
RUN sed -i "s/#PasswordAuthentication yes/PasswordAuthentication yes/" /etc/ssh/sshd_config && \
sed -i "s/PasswordAuthentication no/PasswordAuthentication yes/" /etc/ssh/sshd_config && \
sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin yes/" /etc/ssh/sshd_config
RUN sed -i "s/#PasswordAuthentication yes/PasswordAuthentication no/" /etc/ssh/sshd_config && \
sed -i "s/PasswordAuthentication yes/PasswordAuthentication no/" /etc/ssh/sshd_config && \
sed -i "s/#PermitRootLogin prohibit-password/PermitRootLogin prohibit-password/" /etc/ssh/sshd_config

Copilot uses AI. Check for mistakes.
Comment on lines +149 to +155
RUN set -eux; \
git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /tmp/hermes-agent; \
cd /tmp/hermes-agent; \
pip install --no-cache-dir . ; \
mkdir -p "$HERMES_HOME"; \
chmod 777 "$HERMES_HOME"; \
rm -rf /tmp/hermes-agent
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hermes-agent is installed from a shallow clone of the repo’s default branch, which makes builds non-reproducible and can break unexpectedly if upstream changes. Consider pinning to a specific tag/commit (or a published PyPI version if available) and using that in the build so images are repeatable.

Copilot uses AI. Check for mistakes.
Comment on lines +133 to +140
RUN mkdir -p /root/.jupyter && printf '%s\n' \
'c.ServerApp.token = "yotta"' \
'c.ServerApp.password = ""' \
'c.ServerApp.allow_remote_access = True' \
'c.ServerApp.allow_origin = "*"' \
'c.NotebookApp.token = "yotta"' \
'c.NotebookApp.password = ""' \
'c.NotebookApp.allow_remote_access = True' \
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jupyter is configured for remote access with a fixed, well-known token ("yotta") and an empty password. If this container can be reached from untrusted networks, this is easy to guess and increases risk of unauthorized access. Consider generating a random token when JUPYTER_PASSWORD isn’t explicitly set (or disabling Jupyter by default) and avoid allow_origin="*" unless required.

Copilot uses AI. Check for mistakes.
Comment on lines +103 to +110
# Configure JupyterLab: auto-login with token (no password prompt)
# ===============================
RUN mkdir -p /root/.jupyter && printf '%s\n' \
'c.ServerApp.token = "yotta"' \
'c.ServerApp.password = ""' \
'c.ServerApp.allow_remote_access = True' \
'c.ServerApp.allow_origin = "*"' \
'c.NotebookApp.token = "yotta"' \
Copy link

Copilot AI Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jupyter is configured for remote access with a fixed, well-known token ("yotta") and an empty password. If the container is reachable outside a trusted boundary, this makes unauthorized access much easier. Consider requiring an explicit JUPYTER_PASSWORD/token at runtime (no default) or generating a random token on first start, and avoid allow_origin="*" unless it’s required.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@dadaism dadaism left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hermes template is a solid foundation — the README is thorough and the vLLM startup/health-check logic is well done.

Main concerns are:

  • the VLLM_MAX_MODEL_LEN inconsistency between Dockerfile and README
  • world-readable secrets (chmod 777)
  • unpinned dependencies (hermes-agent, vllm)
  • the unused VLLM_VERSION variable
  • the hadolint bypass.

The pytorch changes should be split into their own PR.

CMAKE_CUDA_ARCHITECTURES="120" \
PYTORCH_ALLOC_CONF="expandable_segments:True" \
PYTHON_VERSION="3.11" \
JUPYTER_PASSWORD=yotta \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to generate random secrets and don't use hardcoded one?

CUDA_HOME=/usr/local/cuda \
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/bin:$PATH \
LD_LIBRARY_PATH=/usr/local/nvidia/lib64:$LD_LIBRARY_PATH \
TORCH_CUDA_ARCH_LIST="12.0" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for Blackwell? Is it possible to support "8.0;9.0;12.0" as well

cd /tmp/hermes-agent; \
pip install --no-cache-dir . ; \
mkdir -p "$HERMES_HOME"; \
chmod 777 "$HERMES_HOME"; \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

700 is sufficient

VLLM_LOG="/workspace/vllm.log" \
OPENAI_BASE_URL=http://localhost:8000/v1

RUN mkdir -p /workspace "$HF_HOME" && chmod -R 777 /workspace "$HF_HOME"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use 700?

ENV HERMES_HOME=/root/.hermes

RUN set -eux; \
git clone --depth 1 https://github.com/NousResearch/hermes-agent.git /tmp/hermes-agent; \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pin to a specific commit or release tag?

# vLLM ecosystem (latest stable — same pattern as OpenClaw)
# ==============================
RUN python -m pip install --no-cache-dir \
vllm transformers sentencepiece "huggingface_hub[cli]" hf_transfer einops \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vllm==${VLLM_VERSION}
lock the version?

&& rm -f /tmp/Python.tgz \
&& cd /tmp/python-src \
&& ./configure --enable-optimizations --with-ensurepip=install \
&& ./configure --with-ensurepip=install \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add --enable-optimizations back

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants