Skip to content

bug: apply_template async endpoint makes blocking urllib calls — event loop stalls for up to 300s #311

@yasinBursali

Description

@yasinBursali

Bug Report: apply_template async endpoint makes blocking urllib calls — event loop stalls for up to 300s

Severity: High
Category: Error Handling / Blocking I/O
Platform: All (macOS, Linux, Windows/WSL2)
Confidence: Confirmed

Description

async def apply_template in routers/templates.py makes multiple synchronous urllib.request.urlopen calls directly inside the async handler body. Because these use blocking I/O with timeouts up to 300 seconds, they stall the asyncio event loop, preventing every other request in the process from being served for the duration. For a template that enables many services, the stall is cumulative (3 blocking calls × 300s timeout × N services if agent is unresponsive).

Affected File(s)

  • dream-server/extensions/services/dashboard-api/routers/templates.py (L97–226, async def apply_template)

Root Cause

apply_template is declared async def, so FastAPI executes it directly in the event loop. Inside the handler, _call_agent_hook(), _call_agent(), and _call_agent_invalidate_compose_cache() — all imported from routers/extensions.py — use urllib.request.urlopen with _AGENT_TIMEOUT=300 or _AGENT_LOG_TIMEOUT=30. These are synchronous blocking calls. Blocking the event loop means no other async endpoint in the process responds until the urllib call returns.

Evidence

# routers/templates.py L97 — async handler
async def apply_template(template_id: str, api_key: str = Depends(verify_api_key)):
    ...
    for svc_id in template.get("services", []):
        ...
        # _call_agent_hook uses urllib.urlopen with _AGENT_TIMEOUT=300:
        _call_agent_hook(svc_id, "pre_start")
        start_ok = _call_agent("start", svc_id)   # also 300s timeout
        _call_agent_hook(svc_id, "post_start")
        _call_agent_invalidate_compose_cache()     # 30s timeout, inside lock

Platform Analysis

  • macOS: Affected — FastAPI async event loop runs the same way on macOS. Hooks add bash-version check overhead on first call.
  • Linux: Affected — same event loop model; host agent at 7710 is local but timeouts still block if agent is slow or briefly down.
  • Windows/WSL2: Affected — same. host.docker.internal adds latency between Docker Desktop VM and WSL2.

Reproduction

  1. Start dashboard API.
  2. Call POST /api/templates/{template_id}/apply for a template with 3+ services.
  3. Simultaneously call GET /health from another client.
  4. Expected: health check responds immediately.
  5. Actual: health check hangs for the full duration of the template apply (potentially minutes if agent is slow).

Impact

Entire dashboard API becomes unresponsive for all users during template apply. Status polling, log streaming, and GPU metric refreshes all freeze until the blocking call chain completes. The preview_template endpoint has the same class of issue with lighter calls.

Suggested Approach

Convert apply_template (and preview_template) to a regular def (non-async) so FastAPI routes them through the thread-pool executor, which isolates the blocking I/O from the event loop. This is the minimal-change fix and consistent with how the other mutation endpoints (enable_extension, disable_extension) are already defined.


Filed by automated Python auditor after full-sweep review of Python changes merged 2026-04-06 → 2026-04-11 on upstream/main @ c0600ca.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions