Skip to content

fix(dashboard-api): async hygiene in routers/extensions.py#1022

Merged
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:fix/extensions-async-hygiene
Apr 27, 2026
Merged

fix(dashboard-api): async hygiene in routers/extensions.py#1022
Lightheartdevs merged 1 commit intoLight-Heart-Labs:mainfrom
yasinBursali:fix/extensions-async-hygiene

Conversation

@yasinBursali
Copy link
Copy Markdown
Contributor

What

Fixes three async-hygiene defects in routers/extensions.py: blocking urllib calls on the event-loop thread, over-broad except Exception catches that swallow programmer errors, and a fire-and-forget executor future that silently discards cleanup failures.

Why

  • extension_logs and extensions_catalog called urllib.urlopen directly on the async event loop. The Console modal polls every 2 s with a 30 s agent timeout, so a single slow host-agent response could block the entire dashboard-api event loop for up to 30 s.
  • _call_agent, _call_agent_invalidate_compose_cache, and _call_agent_compose_rename caught except Exception, meaning AttributeError, TypeError, and other programmer bugs were silently swallowed and logged as "host agent unreachable" — masking real bugs.
  • _cleanup_stale_progress was dispatched via run_in_executor(None, ...) with the returned Future immediately discarded, so any unhandled exception inside the cleanup only produced an opaque "Future exception was never retrieved" warning in stderr with no context.

How

  • Extracted _fetch_agent_logs(url, headers, data, timeout) -> str — a plain synchronous function that can be safely offloaded via asyncio.to_thread.
  • extension_logs: replaced inline urllib.urlopen block with await asyncio.to_thread(_fetch_agent_logs, ...), matching the existing asyncio.to_thread pattern already used in main.py's api_settings_env_save.
  • extensions_catalog: replaced direct _check_agent_health() call with await asyncio.to_thread(_check_agent_health).
  • _call_agent, _call_agent_invalidate_compose_cache, _call_agent_compose_rename, _check_agent_health: narrowed except Exceptionexcept (urllib.error.URLError, urllib.error.HTTPError, OSError, TimeoutError). Non-network exceptions now propagate with a full stack trace instead of a misleading warning.
  • extensions_catalog: retained the run_in_executor Future in _cleanup_future and attached _log_cleanup_error as a done_callback to log any exception at ERROR level with full context.

Testing

  • Automated: pytest 140/140 pass (137 pre-existing + 3 new); ruff clean.
  • New tests (tests/test_extensions.py):
    • test_call_agent_returns_false_on_urlerror — network errors return False and log a warning.
    • test_call_agent_reraises_non_network_errorsAttributeError propagates (behaviour change from old broad catch).
    • test_catalog_logs_when_cleanup_future_fails — cleanup RuntimeError is logged at ERROR level, catalog endpoint still returns 200.
  • Manual: Start dashboard-api with a non-responsive host agent; verify /api/extensions/catalog and /api/extensions/{id}/logs return promptly without blocking other requests.

Platform Impact

  • macOS: Affected — dashboard-api runs in Docker on all platforms; async event-loop fix is platform-neutral.
  • Linux: Affected — same fix applies to NVIDIA/AMD deployments.
  • Windows (WSL2): Affected — same.

Known Considerations

Non-blocking follow-ups:

  1. Add symmetric tests for _call_agent_invalidate_compose_cache and _call_agent_compose_rename (the other two narrowed helpers).
  2. Hoist _log_cleanup_error out of the extensions_catalog function body to module level for reuse.
  3. Audit remaining except Exception: blocks in main.py.

Three async-hygiene defects in routers/extensions.py:
- extension_logs and extensions_catalog called blocking urllib.urlopen
  directly on the event-loop thread. With the Console modal polling
  every 2s and a 30s agent timeout, one slow host-agent response
  could stall the dashboard-api for up to 30s at a time.
- _call_agent, _call_agent_invalidate_compose_cache, and
  _call_agent_compose_rename caught `except Exception`, swallowing
  non-network programmer errors with a misleading "host agent
  unreachable" log. Narrow to (URLError, HTTPError, OSError,
  TimeoutError) and log the actual exception.
- _cleanup_stale_progress was dispatched via run_in_executor with a
  discarded Future, so failures surfaced only as "Future exception
  was never retrieved" warnings in stderr.

Offload blocking urllib calls via asyncio.to_thread (matching the
existing pattern in main.py's api_settings_env_save). Attach a
log-on-exception done-callback to the cleanup future.

Tests cover the URLError swallow path, the new re-raise behaviour on
non-network errors, and the cleanup callback logging.
@Lightheartdevs Lightheartdevs merged commit 9055007 into Light-Heart-Labs:main Apr 27, 2026
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants