Skip to content

Commit cceb86f

Browse files
feat(agent-server): add deferred-init / dormant mode (#3287)
Co-authored-by: openhands <openhands@all-hands.dev>
1 parent 4de2db9 commit cceb86f

7 files changed

Lines changed: 969 additions & 28 deletions

File tree

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,7 @@ When reviewing code, provide constructive feedback:
132132
- Agent-server Docker publish tags are defined centrally in `openhands-agent-server/openhands/agent_server/docker/build.py`; keep `server.yml` manifest publication derived from the emitted per-arch tags so SHA/branch/git-tag aliases stay in sync, while preserving the legacy `latest-<variant>` alias used by workspace defaults.
133133
- The published agent-server Docker images in `.github/workflows/server.yml` must pass `OPENHANDS_BUILD_GIT_SHA` and `OPENHANDS_BUILD_GIT_REF` as explicit `docker/build-push-action` build args; the workflow only uses `docker/build.py` for context/tag generation, so those runtime env vars are otherwise left at the Dockerfile `unknown` defaults.
134134
- The PyInstaller agent-server binary should copy OpenHands distribution metadata (`openhands-agent-server`, `openhands-sdk`, `openhands-tools`, `openhands-workspace`) in `agent-server.spec`, otherwise `/server_info` version lookups via `importlib.metadata` can fall back to `unknown` inside published binary images.
135+
- Agent-server deferred init (warm-pool / dormant mode) is driven by `Config.deferred_init` (env `OH_DEFERRED_INIT`). The `InitService` in `openhands-agent-server/openhands/agent_server/init_router.py` owns the dormant→initializing→ready transition and is registered on `app.state.init_service` only when `deferred_init=True`; the `require_initialized` dependency, added to the `/api/*` router, returns 503 while not `ready`. Bootstrap auth for `POST /api/init` uses the existing `secret_key` (`X-Init-API-Key` header) — the orchestrator already holds this key for encryption, and it is overwritten when the per-user runtime config arrives in the init body. The agent-server's 5xx exception handler rewrites `detail` on 503s, so warm-pool orchestrators should rely on the HTTP status code (not the body) when probing dormant state.
135136

136137

137138
- Auto-title generation should not re-read `ConversationState.events` from a background task triggered by a freshly received `MessageEvent`; extract message text synchronously from the incoming event and then reuse shared title helpers (`extract_message_text`, `generate_title_from_message`) to avoid persistence-order races.
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
"""Example demonstrating deferred-init (warm-pool) mode for the agent server.
2+
3+
In warm-pool deployments, server pods are pre-warmed before a user is matched
4+
to one. The pod boots with ``OH_DEFERRED_INIT=true``: stateless services
5+
(VSCode, tool preload, etc.) start as normal, but all ``/api/*`` routes return
6+
503 until ``POST /api/init`` delivers the runtime configuration (credentials,
7+
workspace paths, session keys).
8+
9+
The orchestrator authenticates the init call with the server's bootstrap secret
10+
key (``OH_SECRET_KEY`` / ``X-Init-API-Key``), which it already holds for
11+
encryption purposes.
12+
13+
Lifecycle demonstrated here:
14+
1. Server starts in dormant mode.
15+
2. ``GET /api/init`` reports state=dormant.
16+
3. ``GET /api/conversations`` returns 503 (dormant gate is active).
17+
4. ``POST /api/init`` delivers runtime config → server transitions to ready.
18+
5. ``GET /api/init`` reports state=ready.
19+
6. A conversation runs normally on the now-ready server.
20+
"""
21+
22+
import os
23+
import tempfile
24+
import time
25+
from uuid import UUID
26+
27+
import httpx
28+
from scripts.utils import ManagedAPIServer
29+
30+
from openhands.sdk import get_logger
31+
32+
33+
logger = get_logger(__name__)
34+
35+
# ── LLM config ──────────────────────────────────────────────────────────────
36+
37+
api_key = os.getenv("LLM_API_KEY")
38+
assert api_key is not None, "LLM_API_KEY environment variable is not set."
39+
llm_model = os.getenv("LLM_MODEL", "gpt-5.5")
40+
llm_base_url = os.getenv("LLM_BASE_URL")
41+
42+
# The orchestrator knows this key before the pod is matched to a user.
43+
# It's used to authenticate POST /api/init and as the encryption secret.
44+
BOOTSTRAP_SECRET_KEY = "demo-warm-pool-bootstrap-key-32b!"
45+
46+
# ── Server lifecycle ─────────────────────────────────────────────────────────
47+
48+
with ManagedAPIServer(
49+
port=8003,
50+
extra_env={
51+
"OH_DEFERRED_INIT": "true",
52+
"OH_SECRET_KEY": BOOTSTRAP_SECRET_KEY,
53+
"TMUX_TMPDIR": "/tmp/oh-tmux-deferred",
54+
},
55+
) as server:
56+
client = httpx.Client(base_url=server.base_url, timeout=120.0)
57+
58+
try:
59+
# ── 1. Confirm dormant state ─────────────────────────────────────────
60+
logger.info("\n" + "=" * 60)
61+
logger.info("📊 Step 1: checking initial (dormant) state")
62+
logger.info("=" * 60)
63+
64+
resp = client.get("/api/init")
65+
assert resp.status_code == 200, f"GET /api/init failed: {resp.text}"
66+
init_status = resp.json()
67+
assert init_status["state"] == "dormant", (
68+
f"Expected dormant, got: {init_status['state']}"
69+
)
70+
logger.info(f"✅ Server is dormant — {init_status}")
71+
72+
# ── 2. Verify the dormant gate blocks /api/* ─────────────────────────
73+
logger.info("\n" + "=" * 60)
74+
logger.info("🚧 Step 2: dormant gate returns 503 on /api/conversations")
75+
logger.info("=" * 60)
76+
77+
resp = client.get("/api/conversations")
78+
assert resp.status_code == 503, (
79+
f"Expected 503 from dormant gate, got {resp.status_code}"
80+
)
81+
logger.info("✅ /api/conversations correctly returns 503 while dormant")
82+
83+
# ── 3. Activate via POST /api/init ───────────────────────────────────
84+
logger.info("\n" + "=" * 60)
85+
logger.info("🚀 Step 3: activating server via POST /api/init")
86+
logger.info("=" * 60)
87+
88+
temp_workspace_dir = tempfile.mkdtemp(prefix="deferred_init_demo_")
89+
90+
# In a real warm-pool deployment, credentials that the server shouldn't
91+
# have at cold-start (e.g., the user's LLM API key) would arrive here.
92+
llm_env: dict[str, str] = {"LLM_API_KEY": api_key}
93+
if llm_base_url:
94+
llm_env["LLM_BASE_URL"] = llm_base_url
95+
96+
init_body: dict = {
97+
# Pass user credentials into the server's environment.
98+
"env": llm_env,
99+
}
100+
101+
resp = client.post(
102+
"/api/init",
103+
json=init_body,
104+
headers={"X-Init-API-Key": BOOTSTRAP_SECRET_KEY},
105+
)
106+
assert resp.status_code == 200, f"POST /api/init failed: {resp.text}"
107+
init_status = resp.json()
108+
assert init_status["state"] == "ready", (
109+
f"Expected ready after init, got: {init_status['state']}"
110+
)
111+
logger.info(f"✅ Server is now ready — {init_status}")
112+
113+
# ── 4. Confirm ready via GET /api/init ───────────────────────────────
114+
resp = client.get("/api/init")
115+
assert resp.status_code == 200
116+
assert resp.json()["state"] == "ready"
117+
logger.info("✅ GET /api/init confirms ready state")
118+
119+
# ── 5. Run a conversation on the now-ready server ────────────────────
120+
logger.info("\n" + "=" * 60)
121+
logger.info("🤖 Step 5: running a conversation on the ready server")
122+
logger.info("=" * 60)
123+
124+
llm_config: dict[str, str] = {"model": llm_model, "api_key": api_key}
125+
if llm_base_url:
126+
llm_config["base_url"] = llm_base_url
127+
128+
start_request: dict = {
129+
"agent": {
130+
"kind": "Agent",
131+
"llm": llm_config,
132+
"tools": [],
133+
},
134+
"workspace": {"working_dir": temp_workspace_dir},
135+
"initial_message": {
136+
"role": "user",
137+
"content": [{"type": "text", "text": "Reply with just the number 42."}],
138+
"run": True,
139+
},
140+
}
141+
142+
resp = client.post("/api/conversations", json=start_request)
143+
assert resp.status_code == 201, f"Start conversation failed: {resp.text}"
144+
conversation_id = UUID(resp.json()["id"])
145+
logger.info(f"✅ Conversation started: {conversation_id}")
146+
147+
# Poll until the agent finishes.
148+
max_wait = 120
149+
elapsed = 0
150+
execution_status = "unknown"
151+
while elapsed < max_wait:
152+
resp = client.get(f"/api/conversations/{conversation_id}")
153+
assert resp.status_code == 200
154+
data = resp.json()
155+
execution_status = data.get("execution_status", "unknown")
156+
if execution_status in ("stopped", "paused", "error"):
157+
break
158+
logger.info(f" status: {execution_status} ({elapsed}s elapsed)")
159+
time.sleep(2)
160+
elapsed += 2
161+
162+
logger.info(f"✅ Conversation finished — status: {execution_status}")
163+
assert execution_status in ("stopped", "paused"), (
164+
f"Unexpected final status: {execution_status}"
165+
)
166+
167+
resp = client.get(f"/api/conversations/{conversation_id}/agent_final_response")
168+
if resp.status_code == 200:
169+
agent_response = resp.json().get("response", "")
170+
logger.info(f" Agent response: {agent_response!r}")
171+
172+
# Collect cost metrics.
173+
accumulated_cost = 0.0
174+
resp = client.get(f"/api/conversations/{conversation_id}")
175+
if resp.status_code == 200:
176+
stats = resp.json().get("stats") or {}
177+
usage_to_metrics = stats.get("usage_to_metrics") or {}
178+
accumulated_cost = sum(
179+
m.get("accumulated_cost", 0.0) for m in usage_to_metrics.values()
180+
)
181+
182+
client.delete(f"/api/conversations/{conversation_id}")
183+
logger.info(" Conversation deleted")
184+
185+
logger.info("\n" + "=" * 60)
186+
logger.info("🎉 Deferred-init example completed successfully!")
187+
logger.info("=" * 60)
188+
189+
print(f"EXAMPLE_COST: {accumulated_cost}")
190+
191+
finally:
192+
client.close()

openhands-agent-server/openhands/agent_server/api.py

Lines changed: 60 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,11 @@
3737
from openhands.agent_server.file_router import file_router
3838
from openhands.agent_server.git_router import git_router
3939
from openhands.agent_server.hooks_router import hooks_router
40+
from openhands.agent_server.init_router import (
41+
InitService,
42+
init_router,
43+
require_initialized,
44+
)
4045
from openhands.agent_server.llm_router import llm_router
4146
from openhands.agent_server.mcp_router import mcp_router
4247
from openhands.agent_server.middleware import CORSDispatcher
@@ -123,7 +128,8 @@ async def api_lifespan(api: FastAPI) -> AsyncIterator[None]:
123128
# Clean up stale tmux sessions from previous server runs
124129
_cleanup_stale_tmux_sessions()
125130

126-
service = get_default_conversation_service()
131+
config: Config = api.state.config
132+
deferred = config.deferred_init
127133
vscode_service = get_vscode_service()
128134
desktop_service = get_desktop_service()
129135
tool_preload_service = get_tool_preload_service()
@@ -184,13 +190,50 @@ async def start_tool_preload_service():
184190
f"Server initialization failed with {len(exceptions)} exception(s)"
185191
) from exceptions[0]
186192

187-
# Mark initialization as complete - now the /ready endpoint will return 200
188-
# and Kubernetes readiness probes will pass
193+
async def stop_stateless_services():
194+
async def stop_vscode_service():
195+
if vscode_service is not None:
196+
await vscode_service.stop()
197+
198+
async def stop_desktop_service():
199+
if desktop_service is not None:
200+
await desktop_service.stop()
201+
202+
async def stop_tool_preload_service():
203+
if tool_preload_service is not None:
204+
await tool_preload_service.stop()
205+
206+
await asyncio.gather(
207+
stop_vscode_service(),
208+
stop_desktop_service(),
209+
stop_tool_preload_service(),
210+
return_exceptions=True,
211+
)
212+
213+
# In deferred-init mode the conversation service is *not* entered
214+
# here — that happens later, when POST /api/init delivers the runtime
215+
# config. We still mark the /ready endpoint as ready so a warm-pool
216+
# orchestrator can tell the pod has finished booting and is
217+
# available to receive its /api/init payload.
218+
if deferred:
219+
init_service = InitService(api, base_config=config)
220+
api.state.init_service = init_service
221+
mark_initialization_complete()
222+
logger.info("Server started in deferred-init mode; awaiting POST /api/init")
223+
try:
224+
yield
225+
finally:
226+
await init_service.teardown()
227+
await stop_stateless_services()
228+
return
229+
230+
# Non-deferred (legacy) path: build and enter the conversation
231+
# service as part of the lifespan, exactly as before.
232+
service = get_default_conversation_service()
189233
mark_initialization_complete()
190234
logger.info("Server initialization complete - ready to serve requests")
191235

192236
async with service:
193-
# Store the initialized service in app state for dependency injection
194237
api.state.conversation_service = service
195238

196239
config = api.state.config
@@ -214,26 +257,7 @@ async def start_tool_preload_service():
214257
with suppress(asyncio.CancelledError):
215258
await retention_task
216259

217-
# Define async functions for stopping each service
218-
async def stop_vscode_service():
219-
if vscode_service is not None:
220-
await vscode_service.stop()
221-
222-
async def stop_desktop_service():
223-
if desktop_service is not None:
224-
await desktop_service.stop()
225-
226-
async def stop_tool_preload_service():
227-
if tool_preload_service is not None:
228-
await tool_preload_service.stop()
229-
230-
# Stop all services concurrently
231-
await asyncio.gather(
232-
stop_vscode_service(),
233-
stop_desktop_service(),
234-
stop_tool_preload_service(),
235-
return_exceptions=True,
236-
)
260+
await stop_stateless_services()
237261
finally:
238262
if tmux_tmpdir_was_defaulted and os.environ.get("TMUX_TMPDIR") == str(
239263
tmux_tmpdir
@@ -293,12 +317,24 @@ def _add_api_routes(app: FastAPI, config: Config) -> None:
293317
"""
294318
app.include_router(server_details_router)
295319

320+
# The /api/init endpoint bypasses both the session-key auth and the
321+
# dormant gate. It has its own X-Init-API-Key auth. When
322+
# ``deferred_init`` is False the endpoints are still mounted but return
323+
# 404 because no InitService is registered on app.state — see
324+
# ``get_init_service``.
325+
init_api_router = APIRouter(prefix="/api")
326+
init_api_router.include_router(init_router)
327+
app.include_router(init_api_router)
328+
296329
# Header-only auth: applied to every /api/* route EXCEPT the workspace
297330
# static-file routes (handled separately below). Cookies are NOT honored
298331
# here so that we don't expand the CSRF surface across the whole API.
299332
dependencies = []
300333
if config.session_api_keys:
301334
dependencies.append(Depends(create_session_api_key_dependency(config)))
335+
# Dormant gate: when ``deferred_init`` is True this 503s every /api/*
336+
# route until POST /api/init completes. No-op for non-deferred deployments.
337+
dependencies.append(Depends(require_initialized))
302338

303339
api_router = APIRouter(prefix="/api", dependencies=dependencies)
304340
api_router.include_router(event_router)

openhands-agent-server/openhands/agent_server/config.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,17 @@ class Config(BaseModel):
236236
"The URL where this agent server instance is available externally"
237237
),
238238
)
239+
deferred_init: bool = Field(
240+
default=False,
241+
description=(
242+
"When True, the server starts in dormant mode. Stateless services "
243+
"(VSCode, tool preload, etc.) start as usual, but the conversation, "
244+
"event, and bash routers return 503 until POST /api/init is called with "
245+
"the runtime configuration. This is intended for warm-pool deployments "
246+
"where pods are pre-warmed before a user is matched and per-user "
247+
"configuration is delivered later."
248+
),
249+
)
239250
model_config: ClassVar[ConfigDict] = {"frozen": True}
240251

241252
@property

0 commit comments

Comments
 (0)