Skip to content

Latest commit

 

History

History
759 lines (576 loc) · 26.7 KB

File metadata and controls

759 lines (576 loc) · 26.7 KB
sidebar_position title description
2
Configuration
Configure Hermes Agent — config.yaml, providers, models, API keys, and more

Configuration

All settings are stored in the ~/.hermes/ directory for easy access.

Directory Structure

~/.hermes/
├── config.yaml     # Settings (model, terminal, TTS, compression, etc.)
├── .env            # API keys and secrets
├── auth.json       # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md         # Optional: global persona (agent embodies this personality)
├── memories/       # Persistent memory (MEMORY.md, USER.md)
├── skills/         # Agent-created skills (managed via skill_manage tool)
├── cron/           # Scheduled jobs
├── sessions/       # Gateway sessions
└── logs/           # Logs (errors.log, gateway.log — secrets auto-redacted)

Managing Configuration

hermes config              # View current configuration
hermes config edit         # Open config.yaml in your editor
hermes config set KEY VAL  # Set a specific value
hermes config check        # Check for missing options (after updates)
hermes config migrate      # Interactively add missing options

# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-...  # Saves to .env

:::tip The hermes config set command automatically routes values to the right file — API keys are saved to .env, everything else to config.yaml. :::

Configuration Precedence

Settings are resolved in this order (highest priority first):

  1. CLI arguments — e.g., hermes chat --model anthropic/claude-sonnet-4 (per-invocation override)
  2. ~/.hermes/config.yaml — the primary config file for all non-secret settings
  3. ~/.hermes/.env — fallback for env vars; required for secrets (API keys, tokens, passwords)
  4. Built-in defaults — hardcoded safe defaults when nothing else is set

:::info Rule of Thumb Secrets (API keys, bot tokens, passwords) go in .env. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in config.yaml. When both are set, config.yaml wins for non-secret settings. :::

Inference Providers

You need at least one way to connect to an LLM. Use hermes model to switch providers and models interactively, or configure directly:

Provider Setup
Nous Portal hermes model (OAuth, subscription-based)
OpenAI Codex hermes model (ChatGPT OAuth, uses Codex models)
OpenRouter OPENROUTER_API_KEY in ~/.hermes/.env
z.ai / GLM GLM_API_KEY in ~/.hermes/.env (provider: zai)
Kimi / Moonshot KIMI_API_KEY in ~/.hermes/.env (provider: kimi-coding)
MiniMax MINIMAX_API_KEY in ~/.hermes/.env (provider: minimax)
MiniMax China MINIMAX_CN_API_KEY in ~/.hermes/.env (provider: minimax-cn)
Custom Endpoint OPENAI_BASE_URL + OPENAI_API_KEY in ~/.hermes/.env

:::info Codex Note The OpenAI Codex provider authenticates via device code (open a URL, enter a code). Credentials are stored at ~/.codex/auth.json and auto-refresh. No Codex CLI installation required. :::

:::warning Even when using Nous Portal, Codex, or a custom endpoint, some tools (vision, web summarization, MoA) use a separate "auxiliary" model — by default Gemini Flash via OpenRouter. An OPENROUTER_API_KEY enables these tools automatically. You can also configure which model and provider these tools use — see Auxiliary Models below. :::

First-Class Chinese AI Providers

These providers have built-in support with dedicated provider IDs. Set the API key and use --provider to select:

# z.ai / ZhipuAI GLM
hermes chat --provider zai --model glm-4-plus
# Requires: GLM_API_KEY in ~/.hermes/.env

# Kimi / Moonshot AI
hermes chat --provider kimi-coding --model moonshot-v1-auto
# Requires: KIMI_API_KEY in ~/.hermes/.env

# MiniMax (global endpoint)
hermes chat --provider minimax --model MiniMax-Text-01
# Requires: MINIMAX_API_KEY in ~/.hermes/.env

# MiniMax (China endpoint)
hermes chat --provider minimax-cn --model MiniMax-Text-01
# Requires: MINIMAX_CN_API_KEY in ~/.hermes/.env

Or set the provider permanently in config.yaml:

model:
  provider: "zai"       # or: kimi-coding, minimax, minimax-cn
  default: "glm-4-plus"

Base URLs can be overridden with GLM_BASE_URL, KIMI_BASE_URL, MINIMAX_BASE_URL, or MINIMAX_CN_BASE_URL environment variables.

Custom & Self-Hosted LLM Providers

Hermes Agent works with any OpenAI-compatible API endpoint. If a server implements /v1/chat/completions, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.

General Setup

Two ways to configure a custom endpoint:

Interactive (recommended):

hermes model
# Select "Custom endpoint (self-hosted / VLLM / etc.)"
# Enter: API base URL, API key, Model name

Manual (.env file):

# Add to ~/.hermes/.env
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=your-key-or-dummy
LLM_MODEL=your-model-name

Everything below follows this same pattern — just change the URL, key, and model name.


Ollama — Local Models, Zero Config

Ollama runs open-weight models locally with one command. Best for: quick local experimentation, privacy-sensitive work, offline use.

# Install and run a model
ollama pull llama3.1:70b
ollama serve   # Starts on port 11434

# Configure Hermes
OPENAI_BASE_URL=http://localhost:11434/v1
OPENAI_API_KEY=ollama           # Any non-empty string
LLM_MODEL=llama3.1:70b

Ollama's OpenAI-compatible endpoint supports chat completions, streaming, and tool calling (for supported models). No GPU required for smaller models — Ollama handles CPU inference automatically.

:::tip List available models with ollama list. Pull any model from the Ollama library with ollama pull <model>. :::


vLLM — High-Performance GPU Inference

vLLM is the standard for production LLM serving. Best for: maximum throughput on GPU hardware, serving large models, continuous batching.

# Start vLLM server
pip install vllm
vllm serve meta-llama/Llama-3.1-70B-Instruct \
  --port 8000 \
  --tensor-parallel-size 2    # Multi-GPU

# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct

vLLM supports tool calling, structured output, and multi-modal models. Use --enable-auto-tool-choice and --tool-call-parser hermes for Hermes-format tool calling with NousResearch models.


SGLang — Fast Serving with RadixAttention

SGLang is an alternative to vLLM with RadixAttention for KV cache reuse. Best for: multi-turn conversations (prefix caching), constrained decoding, structured output.

# Start SGLang server
pip install sglang[all]
python -m sglang.launch_server \
  --model meta-llama/Llama-3.1-70B-Instruct \
  --port 8000 \
  --tp 2

# Configure Hermes
OPENAI_BASE_URL=http://localhost:8000/v1
OPENAI_API_KEY=dummy
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct

llama.cpp / llama-server — CPU & Metal Inference

llama.cpp runs quantized models on CPU, Apple Silicon (Metal), and consumer GPUs. Best for: running models without a datacenter GPU, Mac users, edge deployment.

# Build and start llama-server
cmake -B build && cmake --build build --config Release
./build/bin/llama-server \
  -m models/llama-3.1-8b-instruct-Q4_K_M.gguf \
  --port 8080 --host 0.0.0.0

# Configure Hermes
OPENAI_BASE_URL=http://localhost:8080/v1
OPENAI_API_KEY=dummy
LLM_MODEL=llama-3.1-8b-instruct

:::tip Download GGUF models from Hugging Face. Q4_K_M quantization offers the best balance of quality vs. memory usage. :::


LiteLLM Proxy — Multi-Provider Gateway

LiteLLM is an OpenAI-compatible proxy that unifies 100+ LLM providers behind a single API. Best for: switching between providers without config changes, load balancing, fallback chains, budget controls.

# Install and start
pip install litellm[proxy]
litellm --model anthropic/claude-sonnet-4 --port 4000

# Or with a config file for multiple models:
litellm --config litellm_config.yaml --port 4000

# Configure Hermes
OPENAI_BASE_URL=http://localhost:4000/v1
OPENAI_API_KEY=sk-your-litellm-key
LLM_MODEL=anthropic/claude-sonnet-4

Example litellm_config.yaml with fallback:

model_list:
  - model_name: "best"
    litellm_params:
      model: anthropic/claude-sonnet-4
      api_key: sk-ant-...
  - model_name: "best"
    litellm_params:
      model: openai/gpt-4o
      api_key: sk-...
router_settings:
  routing_strategy: "latency-based-routing"

ClawRouter — Cost-Optimized Routing

ClawRouter by BlockRunAI is a local routing proxy that auto-selects models based on query complexity. It classifies requests across 14 dimensions and routes to the cheapest model that can handle the task. Payment is via USDC cryptocurrency (no API keys).

# Install and start
npx @blockrun/clawrouter    # Starts on port 8402

# Configure Hermes
OPENAI_BASE_URL=http://localhost:8402/v1
OPENAI_API_KEY=dummy
LLM_MODEL=blockrun/auto     # or: blockrun/eco, blockrun/premium, blockrun/agentic

Routing profiles:

Profile Strategy Savings
blockrun/auto Balanced quality/cost 74-100%
blockrun/eco Cheapest possible 95-100%
blockrun/premium Best quality models 0%
blockrun/free Free models only 100%
blockrun/agentic Optimized for tool use varies

:::note ClawRouter requires a USDC-funded wallet on Base or Solana for payment. All requests route through BlockRun's backend API. Run npx @blockrun/clawrouter doctor to check wallet status. :::


Other Compatible Providers

Any service with an OpenAI-compatible API works. Some popular options:

Provider Base URL Notes
Together AI https://api.together.xyz/v1 Cloud-hosted open models
Groq https://api.groq.com/openai/v1 Ultra-fast inference
DeepSeek https://api.deepseek.com/v1 DeepSeek models
Fireworks AI https://api.fireworks.ai/inference/v1 Fast open model hosting
Cerebras https://api.cerebras.ai/v1 Wafer-scale chip inference
Mistral AI https://api.mistral.ai/v1 Mistral models
OpenAI https://api.openai.com/v1 Direct OpenAI access
Azure OpenAI https://YOUR.openai.azure.com/ Enterprise OpenAI
LocalAI http://localhost:8080/v1 Self-hosted, multi-model
Jan http://localhost:1337/v1 Desktop app with local models
# Example: Together AI
OPENAI_BASE_URL=https://api.together.xyz/v1
OPENAI_API_KEY=your-together-key
LLM_MODEL=meta-llama/Llama-3.1-70B-Instruct-Turbo

Choosing the Right Setup

Use Case Recommended
Just want it to work OpenRouter (default) or Nous Portal
Local models, easy setup Ollama
Production GPU serving vLLM or SGLang
Mac / no GPU Ollama or llama.cpp
Multi-provider routing LiteLLM Proxy or OpenRouter
Cost optimization ClawRouter or OpenRouter with sort: "price"
Maximum privacy Ollama, vLLM, or llama.cpp (fully local)
Enterprise / Azure Azure OpenAI with custom endpoint
Chinese AI models z.ai (GLM), Kimi/Moonshot, or MiniMax (first-class providers)

:::tip You can switch between providers at any time with hermes model — no restart required. Your conversation history, memory, and skills carry over regardless of which provider you use. :::

Optional API Keys

Feature Provider Env Variable
Web scraping Firecrawl FIRECRAWL_API_KEY
Browser automation Browserbase BROWSERBASE_API_KEY, BROWSERBASE_PROJECT_ID
Image generation FAL FAL_KEY
Premium TTS voices ElevenLabs ELEVENLABS_API_KEY
OpenAI TTS + voice transcription OpenAI VOICE_TOOLS_OPENAI_KEY
RL Training Tinker + WandB TINKER_API_KEY, WANDB_API_KEY
Cross-session user modeling Honcho HONCHO_API_KEY

Self-Hosting Firecrawl

By default, Hermes uses the Firecrawl cloud API for web search and scraping. If you prefer to run Firecrawl locally, you can point Hermes at a self-hosted instance instead.

What you get: No API key required, no rate limits, no per-page costs, full data sovereignty.

What you lose: The cloud version uses Firecrawl's proprietary "Fire-engine" for advanced anti-bot bypassing (Cloudflare, CAPTCHAs, IP rotation). Self-hosted uses basic fetch + Playwright, so some protected sites may fail. Search uses DuckDuckGo instead of Google.

Setup:

  1. Clone and start the Firecrawl Docker stack (5 containers: API, Playwright, Redis, RabbitMQ, PostgreSQL — requires ~4-8 GB RAM):

    git clone https://github.com/mendableai/firecrawl
    cd firecrawl
    # In .env, set: USE_DB_AUTHENTICATION=false
    docker compose up -d
  2. Point Hermes at your instance (no API key needed):

    hermes config set FIRECRAWL_API_URL http://localhost:3002

You can also set both FIRECRAWL_API_KEY and FIRECRAWL_API_URL if your self-hosted instance has authentication enabled.

OpenRouter Provider Routing

When using OpenRouter, you can control how requests are routed across providers. Add a provider_routing section to ~/.hermes/config.yaml:

provider_routing:
  sort: "throughput"          # "price" (default), "throughput", or "latency"
  # only: ["anthropic"]      # Only use these providers
  # ignore: ["deepinfra"]    # Skip these providers
  # order: ["anthropic", "google"]  # Try providers in this order
  # require_parameters: true  # Only use providers that support all request params
  # data_collection: "deny"   # Exclude providers that may store/train on data

Shortcuts: Append :nitro to any model name for throughput sorting (e.g., anthropic/claude-sonnet-4:nitro), or :floor for price sorting.

Terminal Backend Configuration

Configure which environment the agent uses for terminal commands:

terminal:
  backend: local    # or: docker, ssh, singularity, modal, daytona
  cwd: "."          # Working directory ("." = current dir)
  timeout: 180      # Command timeout in seconds

  # Docker-specific settings
  docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
  docker_volumes:                    # Share host directories with the container
    - "/home/user/projects:/workspace/projects"
    - "/home/user/data:/data:ro"     # :ro for read-only

  # Container resource limits (docker, singularity, modal, daytona)
  container_cpu: 1                   # CPU cores
  container_memory: 5120             # MB (default 5GB)
  container_disk: 51200              # MB (default 50GB)
  container_persistent: true         # Persist filesystem across sessions

Common Terminal Backend Issues

If terminal commands fail immediately or the terminal tool is reported as disabled, check the following:

  • Local backend

    • No special requirements. This is the safest default when you are just getting started.
  • Docker backend

    • Ensure Docker Desktop (or the Docker daemon) is installed and running.
    • The docker CLI must be available in your $PATH. Run:
      docker version
      If this fails, fix your Docker installation or switch back to the local backend:
      hermes config set terminal.backend local
  • SSH backend

    • Both TERMINAL_SSH_HOST and TERMINAL_SSH_USER must be set, for example:
      export TERMINAL_ENV=ssh
      export TERMINAL_SSH_HOST=my-server.example.com
      export TERMINAL_SSH_USER=ubuntu
    • If either value is missing, Hermes will log a clear error and refuse to use the SSH backend.
  • Modal backend

    • You need either a MODAL_TOKEN_ID environment variable or a ~/.modal.toml config file.
    • If neither is present, the backend check fails and Hermes will report that the Modal backend is not available.

When in doubt, set terminal.backend back to local and verify that commands run there first.

Docker Volume Mounts

When using the Docker backend, docker_volumes lets you share host directories with the container. Each entry uses standard Docker -v syntax: host_path:container_path[:options].

terminal:
  backend: docker
  docker_volumes:
    - "/home/user/projects:/workspace/projects"   # Read-write (default)
    - "/home/user/datasets:/data:ro"              # Read-only
    - "/home/user/outputs:/outputs"               # Agent writes, you read

This is useful for:

  • Providing files to the agent (datasets, configs, reference code)
  • Receiving files from the agent (generated code, reports, exports)
  • Shared workspaces where both you and the agent access the same files

Can also be set via environment variable: TERMINAL_DOCKER_VOLUMES='["/host:/container"]' (JSON array).

See Code Execution and the Terminal section of the README for details on each backend.

Memory Configuration

memory:
  memory_enabled: true
  user_profile_enabled: true
  memory_char_limit: 2200   # ~800 tokens
  user_char_limit: 1375     # ~500 tokens

Git Worktree Isolation

Enable isolated git worktrees for running multiple agents in parallel on the same repo:

worktree: true    # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed

When enabled, each CLI session creates a fresh worktree under .worktrees/ with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery.

You can also list gitignored files to copy into worktrees via .worktreeinclude in your repo root:

# .worktreeinclude
.env
.venv/
node_modules/

Context Compression

compression:
  enabled: true
  threshold: 0.85              # Compress at 85% of context limit
  summary_model: "google/gemini-3-flash-preview"   # Model for summarization
  # summary_provider: "auto"   # "auto", "openrouter", "nous", "main"

The summary_model must support a context length at least as large as your main model's, since it receives the full middle section of the conversation for compression.

Auxiliary Models

Hermes uses lightweight "auxiliary" models for side tasks like image analysis, web page summarization, and browser screenshot analysis. By default, these use Gemini Flash via OpenRouter or Nous Portal — you don't need to configure anything.

To use a different model, add an auxiliary section to ~/.hermes/config.yaml:

auxiliary:
  # Image analysis (vision_analyze tool + browser screenshots)
  vision:
    provider: "auto"           # "auto", "openrouter", "nous", "main"
    model: ""                  # e.g. "openai/gpt-4o", "google/gemini-2.5-flash"

  # Web page summarization + browser page text extraction
  web_extract:
    provider: "auto"
    model: ""                  # e.g. "google/gemini-2.5-flash"

Changing the Vision Model

To use GPT-4o instead of Gemini Flash for image analysis:

auxiliary:
  vision:
    model: "openai/gpt-4o"

Or via environment variable (in ~/.hermes/.env):

AUXILIARY_VISION_MODEL=openai/gpt-4o

Provider Options

Provider Description Requirements
"auto" Best available (default). Vision tries OpenRouter → Nous → Codex.
"openrouter" Force OpenRouter — routes to any model (Gemini, GPT-4o, Claude, etc.) OPENROUTER_API_KEY
"nous" Force Nous Portal hermes login
"codex" Force Codex OAuth (ChatGPT account). Supports vision (gpt-5.3-codex). hermes model → Codex
"main" Use your custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY). Works with OpenAI, local models, or any OpenAI-compatible API. OPENAI_BASE_URL + OPENAI_API_KEY

Common Setups

Using OpenAI API key for vision:

# In ~/.hermes/.env:
# OPENAI_BASE_URL=https://api.openai.com/v1
# OPENAI_API_KEY=sk-...

auxiliary:
  vision:
    provider: "main"
    model: "gpt-4o"       # or "gpt-4o-mini" for cheaper

Using OpenRouter for vision (route to any model):

auxiliary:
  vision:
    provider: "openrouter"
    model: "openai/gpt-4o"      # or "google/gemini-2.5-flash", etc.

Using Codex OAuth (ChatGPT Pro/Plus account — no API key needed):

auxiliary:
  vision:
    provider: "codex"     # uses your ChatGPT OAuth token
    # model defaults to gpt-5.3-codex (supports vision)

Using a local/self-hosted model:

auxiliary:
  vision:
    provider: "main"      # uses your OPENAI_BASE_URL endpoint
    model: "my-local-model"

:::tip If you use Codex OAuth as your main model provider, vision works automatically — no extra configuration needed. Codex is included in the auto-detection chain for vision. :::

:::warning Vision requires a multimodal model. If you set provider: "main", make sure your endpoint supports multimodal/vision — otherwise image analysis will fail. :::

Environment Variables

You can also configure auxiliary models via environment variables instead of config.yaml:

Setting Environment Variable
Vision provider AUXILIARY_VISION_PROVIDER
Vision model AUXILIARY_VISION_MODEL
Web extract provider AUXILIARY_WEB_EXTRACT_PROVIDER
Web extract model AUXILIARY_WEB_EXTRACT_MODEL
Compression provider CONTEXT_COMPRESSION_PROVIDER
Compression model CONTEXT_COMPRESSION_MODEL

:::tip Run hermes config to see your current auxiliary model settings. Overrides only show up when they differ from the defaults. :::

Reasoning Effort

Control how much "thinking" the model does before responding:

agent:
  reasoning_effort: ""   # empty = medium (default). Options: xhigh (max), high, medium, low, minimal, none

When unset (default), reasoning effort defaults to "medium" — a balanced level that works well for most tasks. Setting a value overrides it — higher reasoning effort gives better results on complex tasks at the cost of more tokens and latency.

TTS Configuration

tts:
  provider: "edge"              # "edge" | "elevenlabs" | "openai"
  edge:
    voice: "en-US-AriaNeural"   # 322 voices, 74 languages
  elevenlabs:
    voice_id: "pNInz6obpgDQGcFmaJgB"
    model_id: "eleven_multilingual_v2"
  openai:
    model: "gpt-4o-mini-tts"
    voice: "alloy"              # alloy, echo, fable, onyx, nova, shimmer

Display Settings

display:
  tool_progress: all    # off | new | all | verbose
  personality: "kawaii"  # Default personality for the CLI
  compact: false         # Compact output mode (less whitespace)
  resume_display: full   # full (show previous messages on resume) | minimal (one-liner only)
  bell_on_complete: false  # Play terminal bell when agent finishes (great for long tasks)
Mode What you see
off Silent — just the final response
new Tool indicator only when the tool changes
all Every tool call with a short preview (default)
verbose Full args, results, and debug logs

Speech-to-Text (STT)

stt:
  provider: "openai"           # STT provider

Requires VOICE_TOOLS_OPENAI_KEY in .env for OpenAI STT.

Human Delay

Simulate human-like response pacing in messaging platforms:

human_delay:
  mode: "off"                  # off | natural | custom
  min_ms: 500                  # Minimum delay (custom mode)
  max_ms: 2000                 # Maximum delay (custom mode)

Code Execution

Configure the sandboxed Python code execution tool:

code_execution:
  timeout: 300                 # Max execution time in seconds
  max_tool_calls: 50           # Max tool calls within code execution

Browser

Configure browser automation behavior:

browser:
  inactivity_timeout: 120        # Seconds before auto-closing idle sessions
  record_sessions: false         # Auto-record browser sessions as WebM videos to ~/.hermes/browser_recordings/

Checkpoints

Automatic filesystem snapshots before destructive file operations. See the Checkpoints feature page for details.

checkpoints:
  enabled: false                 # Enable automatic checkpoints (also: hermes --checkpoints)
  max_snapshots: 50              # Max checkpoints to keep per directory

Delegation

Configure subagent behavior for the delegate tool:

delegation:
  max_iterations: 50           # Max iterations per subagent
  default_toolsets:             # Toolsets available to subagents
    - terminal
    - file
    - web

Clarify

Configure the clarification prompt behavior:

clarify:
  timeout: 120                 # Seconds to wait for user clarification response

Context Files (SOUL.md, AGENTS.md)

Drop these files in your project directory and the agent automatically picks them up:

File Purpose
AGENTS.md Project-specific instructions, coding conventions
SOUL.md Persona definition — the agent embodies this personality
.cursorrules Cursor IDE rules (also detected)
.cursor/rules/*.mdc Cursor rule files (also detected)
  • AGENTS.md is hierarchical: if subdirectories also have AGENTS.md, all are combined.
  • SOUL.md checks cwd first, then ~/.hermes/SOUL.md as a global fallback.
  • All context files are capped at 20,000 characters with smart truncation.

Working Directory

Context Default
CLI (hermes) Current directory where you run the command
Messaging gateway Home directory ~ (override with MESSAGING_CWD)
Docker / Singularity / Modal / SSH User's home directory inside the container or remote machine

Override the working directory:

# In ~/.hermes/.env or ~/.hermes/config.yaml:
MESSAGING_CWD=/home/myuser/projects    # Gateway sessions
TERMINAL_CWD=/workspace                # All terminal sessions