Skip to content

feat: OpenAI-compatible API server + streaming support#956

Open
teknium1 wants to merge 10 commits intomainfrom
hermes/hermes-106e92b2
Open

feat: OpenAI-compatible API server + streaming support#956
teknium1 wants to merge 10 commits intomainfrom
hermes/hermes-106e92b2

Conversation

@teknium1
Copy link
Contributor

Summary

Rebased and improved version of PR #828. Adds an OpenAI-compatible HTTP API server as a gateway platform adapter, plus streaming support across all platforms.

What this enables

Any OpenAI-compatible frontend — Open WebUI (126k★), LobeChat (73k★), LibreChat (34k★), AnythingLLM (56k★), NextChat (87k★), ChatBox (39k★), etc. — can connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints

Method Path Description
POST /v1/chat/completions OpenAI Chat Completions API (stateless)
POST /v1/responses OpenAI Responses API (stateful via previous_response_id or conversation)
GET /v1/responses/{id} Retrieve a stored response
DELETE /v1/responses/{id} Delete a stored response
GET /v1/models Lists hermes-agent as available model
GET /health Health check

Key features

  • Chat Completions: Full conversation in each request, returns final agent response
  • Responses API: Server-side conversation state, named conversations via conversation parameter
  • Streaming: Real SSE streaming for API server + progressive message editing for Telegram/Discord/Slack
  • System prompt layering: Frontend system messages layered ON TOP of core prompt
  • Bearer token auth: Optional via API_SERVER_KEY env var
  • CORS support: Browser-based frontends can connect directly
  • Usage tracking: Real token counts in responses

Improvements over PR #828

  • Removed dead code: Unused _write_sse_chat_completion pseudo-streaming method deleted
  • Deduplicated model resolution: Extracted _resolve_model() helper in gateway/run.py — API server now imports it instead of duplicating the YAML parsing
  • Cached streaming config: Streaming config is loaded once at GatewayRunner.__init__ instead of parsing config.yaml on every message
  • Setup integration: API_SERVER_ENABLED, API_SERVER_KEY, API_SERVER_PORT, API_SERVER_HOST registered in OPTIONAL_ENV_VARS so hermes setup prompts for them
  • Security docs: Added prominent warning about network exposure when binding to 0.0.0.0 without auth
  • Rebased onto current main: Resolved conflicts with EMAIL platform, last_prompt_tokens tracking, and docs

Documentation

  • API server guide (features/api-server.md)
  • Streaming guide (features/streaming.md)
  • Open WebUI integration guide with Docker Compose (messaging/open-webui.md)

Test results

  • 82 new tests (51 API server + 31 streaming), all passing
  • 1 skipped (gateway streaming config test — optional helper)

Cherry-picked from PR #828, rebased onto current main with conflict resolution.
Cherry-picked from PR #828, resolved conflicts with main.
… WebUI

Cherry-picked from PR #828, resolved conflicts with main.
…tion, cache streaming config, add setup integration and security docs

- Remove unused _write_sse_chat_completion pseudo-streaming method (dead code)
- Extract _resolve_model() helper in gateway/run.py, use from api_server
- Cache streaming config at GatewayRunner init instead of YAML parsing per-message
- Add API_SERVER_* env vars to OPTIONAL_ENV_VARS for hermes setup integration
- Add security warning about network exposure without API_SERVER_KEY
Add configurable reply_to_mode for Telegram multi-chunk replies:
- off: never thread replies to original message
- first: only first chunk threads (default, preserves current behavior)
- all: all chunks thread to original message

Configurable via reply_to_mode in platform config or TELEGRAM_REPLY_TO_MODE
env var.

Cherry-picked from PR #855 by raulvidis, rebased onto current main.
Dropped asyncio_mode=auto pyproject.toml change, added @pytest.mark.asyncio
decorators, fixed test IDs to use numeric strings.

Co-authored-by: Raul <77628552+raulvidis@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants