Skip to content

Latest commit

Β 

History

History
464 lines (349 loc) Β· 17.8 KB

File metadata and controls

464 lines (349 loc) Β· 17.8 KB

ZeroClaw Providers Reference

This document maps provider IDs, aliases, and credential environment variables.

Last verified: March 1, 2026.

How to List Providers

zeroclaw providers

Credential Resolution Order

Runtime resolution order is:

  1. Explicit credential from config/CLI
  2. Provider-specific env var(s)
  3. Generic fallback env vars: ZEROCLAW_API_KEY then API_KEY

For resilient fallback chains (reliability.fallback_providers), each fallback provider resolves credentials independently. The primary provider's explicit credential is not reused for fallback providers.

Provider Catalog

Canonical ID Aliases Local Provider-specific env var(s)
openrouter β€” No OPENROUTER_API_KEY
anthropic β€” No ANTHROPIC_OAUTH_TOKEN, ANTHROPIC_API_KEY
openai β€” No OPENAI_API_KEY
ollama β€” Yes OLLAMA_API_KEY (optional)
gemini google, google-gemini No GEMINI_API_KEY, GOOGLE_API_KEY
venice β€” No VENICE_API_KEY
vercel vercel-ai No VERCEL_API_KEY
cloudflare cloudflare-ai No CLOUDFLARE_API_KEY
moonshot kimi No MOONSHOT_API_KEY
stepfun step, step-ai, step_ai No STEP_API_KEY, STEPFUN_API_KEY
kimi-code kimi_coding, kimi_for_coding No KIMI_CODE_API_KEY, MOONSHOT_API_KEY
synthetic β€” No SYNTHETIC_API_KEY
opencode opencode-zen No OPENCODE_API_KEY
zai z.ai No ZAI_API_KEY
glm zhipu No GLM_API_KEY
minimax minimax-intl, minimax-io, minimax-global, minimax-cn, minimaxi, minimax-oauth, minimax-oauth-cn, minimax-portal, minimax-portal-cn No MINIMAX_OAUTH_TOKEN, MINIMAX_API_KEY
bedrock aws-bedrock No AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY (optional: AWS_REGION)
qianfan baidu No QIANFAN_API_KEY
doubao volcengine, ark, doubao-cn No ARK_API_KEY, DOUBAO_API_KEY
siliconflow silicon-cloud, siliconcloud No SILICONFLOW_API_KEY
hunyuan tencent No HUNYUAN_API_KEY
qwen dashscope, qwen-intl, dashscope-intl, qwen-us, dashscope-us, qwen-code, qwen-oauth, qwen_oauth No QWEN_OAUTH_TOKEN, DASHSCOPE_API_KEY
groq β€” No GROQ_API_KEY
mistral β€” No MISTRAL_API_KEY
xai grok No XAI_API_KEY
deepseek β€” No DEEPSEEK_API_KEY
together together-ai No TOGETHER_API_KEY
fireworks fireworks-ai No FIREWORKS_API_KEY
novita β€” No NOVITA_API_KEY
perplexity β€” No PERPLEXITY_API_KEY
cohere β€” No COHERE_API_KEY
copilot github-copilot No (use config/API_KEY fallback with GitHub token)
cursor β€” Yes (none; Cursor manages its own credentials)
lmstudio lm-studio Yes (optional; local by default)
llamacpp llama.cpp Yes LLAMACPP_API_KEY (optional; only if server auth is enabled)
sglang β€” Yes SGLANG_API_KEY (optional)
vllm β€” Yes VLLM_API_KEY (optional)
osaurus β€” Yes OSAURUS_API_KEY (optional; defaults to "osaurus")
nvidia nvidia-nim, build.nvidia.com No NVIDIA_API_KEY

Cursor (Headless CLI) Notes

  • Provider ID: cursor
  • Invocation: cursor --headless [--model <model>] - (prompt is sent via stdin)
  • The cursor binary must be in PATH, or override its location with CURSOR_PATH env var.
  • Authentication is managed by Cursor itself (its own credential store); no API key is required.
  • The model argument is forwarded to cursor as-is; use "default" (or leave model empty) to let Cursor select the model.
  • This provider spawns a subprocess per request and is best suited for batch/script usage rather than high-throughput inference.
  • Limitations: Only the system prompt (if any) and the last user message are forwarded per request. Full multi-turn conversation history is not preserved because the headless CLI accepts a single prompt per invocation. Temperature control is not supported; non-default values return an explicit error.

Vercel AI Gateway Notes

  • Provider ID: vercel (alias: vercel-ai)
  • Base API URL: https://ai-gateway.vercel.sh/v1
  • Authentication: VERCEL_API_KEY
  • Vercel AI Gateway usage does not require a project deployment.
  • If you see DEPLOYMENT_NOT_FOUND, verify the provider is targeting the gateway endpoint above instead of https://api.vercel.ai.

Gemini Notes

  • Provider ID: gemini (aliases: google, google-gemini)
  • Auth can come from GEMINI_API_KEY, GOOGLE_API_KEY, or Gemini CLI OAuth cache (~/.gemini/oauth_creds.json)
  • API key requests use generativelanguage.googleapis.com/v1beta
  • Gemini CLI OAuth requests use cloudcode-pa.googleapis.com/v1internal with Code Assist request envelope semantics
  • Thinking models (e.g. gemini-3-pro-preview) are supported β€” internal reasoning parts are automatically filtered from the response

Qwen (Alibaba Cloud) Notes

  • Provider IDs: qwen, qwen-code (OAuth), qwen-oauth, dashscope, qwen-intl, qwen-us
  • OAuth Free Tier: Use qwen-code or set api_key = "qwen-oauth" in config
    • Endpoint: portal.qwen.ai/v1
    • Credentials: ~/.qwen/oauth_creds.json (use qwen login to authenticate)
    • Daily quota: 1000 requests
    • Available model: qwen3-coder-plus (verified 2026-02-24)
    • Context window: ~32K tokens
  • API Key Access: Use qwen or dashscope provider with DASHSCOPE_API_KEY
    • Endpoint: dashscope.aliyuncs.com/compatible-mode/v1
    • Higher quotas and more models available with paid API key
  • Authentication: QWEN_OAUTH_TOKEN (for OAuth) or DASHSCOPE_API_KEY (for API key)
  • Recommended Model: qwen3-coder-plus - Optimized for coding tasks
  • Quota Tracking: zeroclaw providers-quota --provider qwen-code shows static quota info (?/1000 - unknown remaining, 1000/day total)
    • Qwen OAuth API does not return rate limit headers
    • Actual request counting requires local counter (not implemented)
    • Rate limit errors are detected and parsed for retry backoff
  • Limitations:
    • OAuth free tier limited to 1 model and 1000 requests/day
    • See test report: docs/qwen-provider-test-report.md

Volcengine ARK (Doubao) Notes

  • Runtime provider ID: doubao (aliases: volcengine, ark, doubao-cn)
  • Onboarding display/canonical name: volcengine
  • Base API URL: https://ark.cn-beijing.volces.com/api/v3
  • Chat endpoint: /chat/completions
  • Model discovery endpoint: /models
  • Authentication: ARK_API_KEY (fallback: DOUBAO_API_KEY)
  • Default model preset: doubao-1-5-pro-32k-250115

Minimal setup example:

export ARK_API_KEY="your-ark-api-key"
zeroclaw onboard --provider volcengine --api-key "$ARK_API_KEY" --model doubao-1-5-pro-32k-250115 --force

Quick validation:

zeroclaw models refresh --provider volcengine
zeroclaw agent --provider volcengine --model doubao-1-5-pro-32k-250115 -m "ping"

StepFun Notes

Minimal setup example:

export STEP_API_KEY="your-stepfun-api-key"
zeroclaw onboard --provider stepfun --api-key "$STEP_API_KEY" --model step-3.5-flash --force

Quick validation:

zeroclaw models refresh --provider stepfun
zeroclaw agent --provider stepfun --model step-3.5-flash -m "ping"

SiliconFlow Notes

  • Provider ID: siliconflow (aliases: silicon-cloud, siliconcloud)
  • Base API URL: https://api.siliconflow.cn/v1
  • Chat endpoint: /chat/completions
  • Model discovery endpoint: /models
  • Authentication: SILICONFLOW_API_KEY
  • Default model preset: Pro/zai-org/GLM-4.7

Minimal setup example:

export SILICONFLOW_API_KEY="your-siliconflow-api-key"
zeroclaw onboard --provider siliconflow --api-key "$SILICONFLOW_API_KEY" --model Pro/zai-org/GLM-4.7 --force

Quick validation:

zeroclaw models refresh --provider siliconflow
zeroclaw agent --provider siliconflow --model Pro/zai-org/GLM-4.7 -m "ping"

Ollama Vision Notes

  • Provider ID: ollama
  • Vision input is supported through user message image markers: [IMAGE:<source>].
  • After multimodal normalization, ZeroClaw sends image payloads through Ollama's native messages[].images field.
  • If a non-vision provider is selected, ZeroClaw returns a structured capability error instead of silently ignoring images.

Ollama Cloud Routing Notes

  • Use :cloud model suffix only with a remote Ollama endpoint.
  • Remote endpoint should be set in api_url (example: https://ollama.com).
  • ZeroClaw normalizes a trailing /api in api_url automatically.
  • If default_model ends with :cloud while api_url is local or unset, config validation fails early with an actionable error.
  • Local Ollama model discovery intentionally excludes :cloud entries to avoid selecting cloud-only models in local mode.

Hunyuan Notes

  • Provider ID: hunyuan (alias: tencent)
  • Base API URL: https://api.hunyuan.cloud.tencent.com/v1
  • Authentication: HUNYUAN_API_KEY (obtain from Tencent Cloud console)
  • Recommended models: hunyuan-t1-latest (deep reasoning), hunyuan-turbo-latest (fast), hunyuan-pro (high quality)

llama.cpp Server Notes

  • Provider ID: llamacpp (alias: llama.cpp)
  • Default endpoint: http://localhost:8080/v1
  • API key is optional by default; set LLAMACPP_API_KEY only when llama-server is started with --api-key.
  • Model discovery: zeroclaw models refresh --provider llamacpp

SGLang Server Notes

  • Provider ID: sglang
  • Default endpoint: http://localhost:30000/v1
  • API key is optional by default; set SGLANG_API_KEY only when the server requires authentication.
  • Tool calling requires launching SGLang with --tool-call-parser (e.g. hermes, llama3, qwen25).
  • Model discovery: zeroclaw models refresh --provider sglang

vLLM Server Notes

  • Provider ID: vllm
  • Default endpoint: http://localhost:8000/v1
  • API key is optional by default; set VLLM_API_KEY only when the server requires authentication.
  • Model discovery: zeroclaw models refresh --provider vllm

Osaurus Server Notes

  • Provider ID: osaurus
  • Default endpoint: http://localhost:1337/v1
  • API key defaults to "osaurus" but is optional; set OSAURUS_API_KEY to override or leave unset for keyless access.
  • Model discovery: zeroclaw models refresh --provider osaurus
  • Osaurus is a unified AI edge runtime for macOS (Apple Silicon) that combines local MLX inference with cloud provider proxying through a single endpoint.
  • Supports multiple API formats simultaneously: OpenAI-compatible (/v1/chat/completions), Anthropic (/messages), Ollama (/chat), and Open Responses (/v1/responses).
  • Built-in MCP (Model Context Protocol) support for tool and context server connectivity.
  • Local models run via MLX (Llama, Qwen, Gemma, GLM, Phi, Nemotron, and others); cloud models are proxied transparently.

Bedrock Notes

  • Provider ID: bedrock (alias: aws-bedrock)
  • API: Converse API
  • Authentication: AWS AKSK (not a single API key). Set AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY environment variables.
  • Optional: AWS_SESSION_TOKEN for temporary/STS credentials, AWS_REGION or AWS_DEFAULT_REGION (default: us-east-1).
  • Default onboarding model: anthropic.claude-sonnet-4-5-20250929-v1:0
  • Supports native tool calling and prompt caching (cachePoint).
  • Cross-region inference profiles supported (e.g., us.anthropic.claude-*).
  • Model IDs use Bedrock format: anthropic.claude-sonnet-4-6, anthropic.claude-opus-4-6-v1, etc.

Ollama Reasoning Toggle

You can control Ollama reasoning/thinking behavior from config.toml:

[runtime]
reasoning_enabled = false

Behavior:

  • false: sends think: false to Ollama /api/chat requests.
  • true: sends think: true.
  • Unset: omits think and keeps Ollama/model defaults.

Ollama Vision Override

Some Ollama models support vision (e.g. llava, llama3.2-vision) while others do not. Since ZeroClaw cannot auto-detect this, you can override it in config.toml:

default_provider = "ollama"
default_model = "llava"
model_support_vision = true

Behavior:

  • true: enables image attachment processing in the agent loop.
  • false: disables vision even if the provider reports support.
  • Unset: uses the provider's built-in default.

Environment override: ZEROCLAW_MODEL_SUPPORT_VISION=true

OpenAI Codex Reasoning Level

You can control OpenAI Codex reasoning effort from config.toml:

[provider]
reasoning_level = "high"

Behavior:

  • Supported values: minimal, low, medium, high, xhigh (case-insensitive).
  • When set, overrides ZEROCLAW_CODEX_REASONING_EFFORT.
  • Unset falls back to ZEROCLAW_CODEX_REASONING_EFFORT if present, otherwise defaults to xhigh.
  • Legacy compatibility: runtime.reasoning_level is accepted but deprecated; prefer provider.reasoning_level.
  • If both provider.reasoning_level and runtime.reasoning_level are set, provider-level value wins.

Kimi Code Notes

  • Provider ID: kimi-code
  • Endpoint: https://api.kimi.com/coding/v1
  • Default onboarding model: kimi-for-coding (alternative: kimi-k2.5)
  • Runtime auto-adds User-Agent: KimiCLI/0.77 for compatibility.

NVIDIA NIM Notes

  • Canonical provider ID: nvidia
  • Aliases: nvidia-nim, build.nvidia.com
  • Base API URL: https://integrate.api.nvidia.com/v1
  • Model discovery: zeroclaw models refresh --provider nvidia

Recommended starter model IDs (verified against NVIDIA API catalog on February 18, 2026):

  • meta/llama-3.3-70b-instruct
  • deepseek-ai/deepseek-v3.2
  • nvidia/llama-3.3-nemotron-super-49b-v1.5
  • nvidia/llama-3.1-nemotron-ultra-253b-v1

Custom Endpoints

  • OpenAI-compatible endpoint:
default_provider = "custom:https://your-api.example.com"
  • Anthropic-compatible endpoint:
default_provider = "anthropic-custom:https://your-api.example.com"

MiniMax OAuth Setup (config.toml)

Set the MiniMax provider and OAuth placeholder in config:

default_provider = "minimax-oauth"
api_key = "minimax-oauth"

Then provide one of the following credentials via environment variables:

  • MINIMAX_OAUTH_TOKEN (preferred, direct access token)
  • MINIMAX_API_KEY (legacy/static token)
  • MINIMAX_OAUTH_REFRESH_TOKEN (auto-refreshes access token at startup)

Optional:

  • MINIMAX_OAUTH_REGION=global or cn (defaults by provider alias)
  • MINIMAX_OAUTH_CLIENT_ID to override the default OAuth client id

Channel compatibility note:

  • For MiniMax-backed channel conversations, runtime history is normalized to keep valid user/assistant turn order.
  • Channel-specific delivery guidance (for example Telegram attachment markers) is merged into the leading system prompt instead of being appended as a trailing system turn.

Qwen Code OAuth Setup (config.toml)

Set Qwen Code OAuth mode in config:

default_provider = "qwen-code"
api_key = "qwen-oauth"

Credential resolution for qwen-code:

  1. Explicit api_key value (if not the placeholder qwen-oauth)
  2. QWEN_OAUTH_TOKEN
  3. ~/.qwen/oauth_creds.json (reuses Qwen Code cached OAuth credentials)
  4. Optional refresh via QWEN_OAUTH_REFRESH_TOKEN (or cached refresh token)
  5. If no OAuth placeholder is used, DASHSCOPE_API_KEY can still be used as fallback

Optional endpoint override:

  • QWEN_OAUTH_RESOURCE_URL (normalized to https://.../v1 if needed)
  • If unset, resource_url from cached OAuth credentials is used when available

Model Routing (hint:<name>)

You can route model calls by hint using [[model_routes]]:

[[model_routes]]
hint = "reasoning"
provider = "openrouter"
model = "anthropic/claude-opus-4-20250514"
max_tokens = 8192

[[model_routes]]
hint = "fast"
provider = "groq"
model = "llama-3.3-70b-versatile"

Then call with a hint model name (for example from tool or integration paths):

hint:reasoning

Embedding Routing (hint:<name>)

You can route embedding calls with the same hint pattern using [[embedding_routes]]. Set [memory].embedding_model to a hint:<name> value to activate routing.

[memory]
embedding_model = "hint:semantic"

[[embedding_routes]]
hint = "semantic"
provider = "openai"
model = "text-embedding-3-small"
dimensions = 1536

[[embedding_routes]]
hint = "archive"
provider = "custom:https://embed.example.com/v1"
model = "your-embedding-model-id"
dimensions = 1024

Supported embedding providers:

  • none
  • openai
  • custom:<url> (OpenAI-compatible embeddings endpoint)

Optional per-route key override:

[[embedding_routes]]
hint = "semantic"
provider = "openai"
model = "text-embedding-3-small"
api_key = "sk-route-specific"

Upgrading Models Safely

Use stable hints and update only route targets when providers deprecate model IDs.

Recommended workflow:

  1. Keep call sites stable (hint:reasoning, hint:semantic).
  2. Change only the target model under [[model_routes]] or [[embedding_routes]].
  3. Run:
    • zeroclaw doctor
    • zeroclaw status
  4. Smoke test one representative flow (chat + memory retrieval) before rollout.

This minimizes breakage because integrations and prompts do not need to change when model IDs are upgraded.