feat: OpenAI-compatible API server + streaming support by teknium1 · Pull Request #956 · NousResearch/hermes-agent

teknium1 · 2026-03-11T16:01:57Z

Summary

Rebased and improved version of PR #828. Adds an OpenAI-compatible HTTP API server as a gateway platform adapter, plus streaming support across all platforms.

What this enables

Any OpenAI-compatible frontend — Open WebUI (126k★), LobeChat (73k★), LibreChat (34k★), AnythingLLM (56k★), NextChat (87k★), ChatBox (39k★), etc. — can connect to hermes-agent by pointing at http://localhost:8642/v1.

Endpoints

Method	Path	Description
POST	`/v1/chat/completions`	OpenAI Chat Completions API (stateless)
POST	`/v1/responses`	OpenAI Responses API (stateful via `previous_response_id` or `conversation`)
GET	`/v1/responses/{id}`	Retrieve a stored response
DELETE	`/v1/responses/{id}`	Delete a stored response
GET	`/v1/models`	Lists hermes-agent as available model
GET	`/health`	Health check

Key features

Chat Completions: Full conversation in each request, returns final agent response
Responses API: Server-side conversation state, named conversations via conversation parameter
Streaming: Real SSE streaming for API server + progressive message editing for Telegram/Discord/Slack
System prompt layering: Frontend system messages layered ON TOP of core prompt
Bearer token auth: Optional via API_SERVER_KEY env var
CORS support: Browser-based frontends can connect directly
Usage tracking: Real token counts in responses

Improvements over PR #828

Removed dead code: Unused _write_sse_chat_completion pseudo-streaming method deleted
Deduplicated model resolution: Extracted _resolve_model() helper in gateway/run.py — API server now imports it instead of duplicating the YAML parsing
Cached streaming config: Streaming config is loaded once at GatewayRunner.__init__ instead of parsing config.yaml on every message
Setup integration: API_SERVER_ENABLED, API_SERVER_KEY, API_SERVER_PORT, API_SERVER_HOST registered in OPTIONAL_ENV_VARS so hermes setup prompts for them
Security docs: Added prominent warning about network exposure when binding to 0.0.0.0 without auth
Rebased onto current main: Resolved conflicts with EMAIL platform, last_prompt_tokens tracking, and docs

Documentation

API server guide (features/api-server.md)
Streaming guide (features/streaming.md)
Open WebUI integration guide with Docker Compose (messaging/open-webui.md)

Test results

82 new tests (51 API server + 31 streaming), all passing
1 skipped (gateway streaming config test — optional helper)

Cherry-picked from PR #828, rebased onto current main with conflict resolution.

… CORS Cherry-picked from PR #828.

Cherry-picked from PR #828.

Cherry-picked from PR #828, resolved conflicts with main.

… WebUI Cherry-picked from PR #828, resolved conflicts with main.

…tion, cache streaming config, add setup integration and security docs - Remove unused _write_sse_chat_completion pseudo-streaming method (dead code) - Extract _resolve_model() helper in gateway/run.py, use from api_server - Cache streaming config at GatewayRunner init instead of YAML parsing per-message - Add API_SERVER_* env vars to OPTIONAL_ENV_VARS for hermes setup integration - Add security warning about network exposure without API_SERVER_KEY

Add configurable reply_to_mode for Telegram multi-chunk replies: - off: never thread replies to original message - first: only first chunk threads (default, preserves current behavior) - all: all chunks thread to original message Configurable via reply_to_mode in platform config or TELEGRAM_REPLY_TO_MODE env var. Cherry-picked from PR #855 by raulvidis, rebased onto current main. Dropped asyncio_mode=auto pyproject.toml change, added @pytest.mark.asyncio decorators, fixed test IDs to use numeric strings. Co-authored-by: Raul <77628552+raulvidis@users.noreply.github.com>

teknium1 added 8 commits March 11, 2026 08:53

feat: add OpenAI-compatible API server platform adapter (Phase 1)

58dc5c4

Cherry-picked from PR #828, rebased onto current main with conflict resolution.

feat: enhance Responses API — retrieval, deletion, tool calls, usage,…

7d771c2

… CORS Cherry-picked from PR #828.

feat: add conversation parameter + named session chaining

7ae208b

Cherry-picked from PR #828.

feat: add pseudo-streaming SSE + conversation parameter

b3c798d

Cherry-picked from PR #828.

docs: add Open WebUI integration guide

b2a4092

Cherry-picked from PR #828.

feat: add streaming LLM response support across all platforms

95d221c

Cherry-picked from PR #828, resolved conflicts with main.

docs: comprehensive documentation for API server, streaming, and Open…

d54280e

… WebUI Cherry-picked from PR #828, resolved conflicts with main.

teknium1 mentioned this pull request Mar 11, 2026

feat: OpenAI-compatible API server — Chat Completions + Responses API #828

Closed

teknium1 mentioned this pull request Mar 11, 2026

feat(gateway): Telegram reply threading modes + forum topic fix #855

Closed

3 tasks

docs: add reply threading mode section to Telegram docs

79b3d36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: OpenAI-compatible API server + streaming support#956

feat: OpenAI-compatible API server + streaming support#956
teknium1 wants to merge 10 commits intomainfrom
hermes/hermes-106e92b2

teknium1 commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

teknium1 commented Mar 11, 2026

Summary

What this enables

Endpoints

Key features

Improvements over PR #828

Documentation

Test results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants