Context
Two tests in `dream-server/extensions/services/dashboard-api/tests/test_updates.py` fail intermittently in CI. Observed failing once on upstream PR Light-Heart-Labs#901 (`fix/opencode-json-sed`, commit `b40378d2`) while passing on 19 other recent Dashboard workflow runs across main and other PRs.
CI log showed:
```
FAILED tests/test_updates.py::test_get_version_with_mock_github
AssertionError: assert None == '2.0.0'
FAILED tests/test_updates.py::test_get_releases_manifest_authenticated
AttributeError: 'str' object has no attribute 'get'
Both failures are **real bugs**, not noise. They are latent and only surface when the test environment lands in a particular state (test order, GitHub rate limiting, or an empty module-level cache).
---
## Bug 1 — \`test_get_version_with_mock_github\` mocks the wrong library
### The test
\`tests/test_updates.py:26\`
```python
def test_get_version_with_mock_github(test_client):
mock_response = MagicMock()
mock_response.read.return_value = b'{\"tag_name\": \"v2.0.0\", \"html_url\": \"https://github.com/test\"}'
mock_response.__enter__ = lambda self: self
mock_response.__exit__ = lambda self, *args: None
with patch(\"urllib.request.urlopen\", return_value=mock_response):
resp = test_client.get(\"/api/version\", headers=test_client.auth_headers)
assert resp.status_code == 200
data = resp.json()
assert data[\"latest\"] == \"2.0.0\"
The router
`routers/updates.py:109-158` — the `/api/version` endpoint does NOT use `urllib.request`. It uses `httpx.AsyncClient`:
async def _refresh_release_cache() -> Optional[dict]:
global _version_cache
try:
async with httpx.AsyncClient(timeout=5.0) as client:
response = await client.get(
\"https://api.github.com/repos/Light-Heart-Labs/DreamServer/releases/latest\",
headers=_GITHUB_HEADERS,
)
data = response.json()
...
So `patch("urllib.request.urlopen", ...)` does nothing. The mock never intercepts anything.
Why the test sometimes passes
`_version_cache` at module level (`routers/updates.py:27`) is shared across all tests in the same pytest process. Other tests (e.g. `test_releases_manifest_with_mocked_github` at L78) correctly patch `routers.updates.httpx.AsyncClient` and mutate the global cache as a side effect. Depending on pytest test ordering, `_version_cache` may already contain `payload = {"latest": "2.0.0", ...}` by the time this test runs, and `_get_cached_release_payload()` returns it — bypassing the network call entirely. The test then "passes" by accident.
If this test happens to run early in the suite (before `_version_cache` is populated), the router takes the fallback path at L153-158:
try:
payload = await asyncio.wait_for(asyncio.shield(refresh_task), timeout=1.25)
return _build_version_result(current, payload)
except asyncio.TimeoutError:
return _build_version_result(current, None)
The 1.25-second timeout fires before the (real) httpx call to GitHub completes in a loaded CI runner, `_build_version_result` is called with `None`, and returns `{"latest": None, ...}`. Assertion `data["latest"] == "2.0.0"` becomes `assert None == '2.0.0'` — the exact error seen in CI.
Root-cause fix
- Change the patch target to the actual library: `with patch("routers.updates.httpx.AsyncClient", return_value=mock_async_client):` with a properly-constructed `AsyncMock` (see `test_releases_manifest_with_mocked_github` at L99-111 for the correct pattern).
- Add a pytest fixture that resets `routers.updates._version_cache` and `routers.updates._version_refresh_task` before every test in this module:
@pytest.fixture(autouse=True)
def reset_version_cache():
import routers.updates as u
u._version_cache = {\"expires_at\": 0.0, \"payload\": None}
if u._version_refresh_task is not None and not u._version_refresh_task.done():
u._version_refresh_task.cancel()
u._version_refresh_task = None
yield
- Optionally make `_build_version_result` treat `None` payload as 500 (instead of silently returning a half-populated dict), so the failure mode is explicit rather than silent.
Bug 2 — `test_get_releases_manifest_authenticated` makes real network calls + router does not validate response shape
The test
`tests/test_updates.py:47`
def test_get_releases_manifest_authenticated(test_client):
resp = test_client.get(\"/api/releases/manifest\", headers=test_client.auth_headers)
assert resp.status_code == 200
data = resp.json()
assert \"releases\" in data
assert \"checked_at\" in data
assert isinstance(data[\"releases\"], list)
There are no mocks. The test calls the endpoint and expects it to work. The endpoint hits GitHub:
The router
`routers/updates.py:161-184`
@router.get(\"/api/releases/manifest\", dependencies=[Depends(verify_api_key)])
async def get_release_manifest():
try:
async with httpx.AsyncClient(timeout=5.0) as client:
resp = await client.get(
\"https://api.github.com/repos/Light-Heart-Labs/DreamServer/releases?per_page=5\",
headers=_GITHUB_HEADERS,
)
releases = resp.json()
return {
\"releases\": [
{\"version\": r.get(\"tag_name\", \"\").lstrip(\"v\"), ...}
for r in releases
],
\"checked_at\": datetime.now(timezone.utc).isoformat() + \"Z\"
}
except (httpx.HTTPError, httpx.TimeoutException, json.JSONDecodeError, OSError):
...
The failure mode
GitHub's unauthenticated API allows 60 requests/hour per source IP. When shared CI runners (GitHub Actions with ephemeral public IPs) exceed that limit — which happens regularly on busy days — GitHub returns:
{
\"message\": \"API rate limit exceeded for ...\",
\"documentation_url\": \"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting\"
}
`resp.json()` returns a dict, not a list. The `except` clause on L178 catches `HTTPError`, `TimeoutException`, `JSONDecodeError`, `OSError` — but not the case where the response is valid JSON but the wrong shape. The list comprehension `[r.get("tag_name", ...) for r in releases]` then iterates the dict's keys (strings like `"message"`), and calling `.get("tag_name", "")` on a string produces the exact error from CI:
AttributeError: 'str' object has no attribute 'get'
Root-cause fix
- Mock the test. Either redirect `test_get_releases_manifest_authenticated` to use the same `patch("routers.updates.httpx.AsyncClient", ...)` pattern that `test_releases_manifest_with_mocked_github` (L78) already uses, or delete this test since the mocked version already covers the success path. No Dream Server test should hit the live GitHub API.
- Harden the router. In `get_release_manifest`, validate `releases` is a list before iterating:
releases = resp.json()
if not isinstance(releases, list):
# GitHub rate-limit dict, error blob, or any unexpected shape
raise ValueError(f\"Unexpected GitHub response shape: {type(releases).__name__}\")
The existing `except` block already catches `ValueError` (line 178 includes `json.JSONDecodeError` which is a `ValueError` subclass — adding a bare `ValueError` widens it), so this falls through cleanly to the local-version fallback.
- Optional: pipe a `GITHUB_TOKEN` through `_GITHUB_HEADERS` in CI runs so the 60/hr limit becomes 5000/hr and the symptom disappears even if both root causes stay latent. Workflow already has `secrets.GITHUB_TOKEN`; just add `if token: headers["Authorization"] = f"Bearer {token}"` at the router top.
Why this wasn't caught before
- Bug 1 silently passes when `_version_cache` is pre-warmed by an earlier test in the module. Most pytest runs happen to order tests such that this is the case.
- Bug 2 passes whenever the CI runner's IP hasn't hit the 60/hr GitHub rate limit for the past hour, which is most of the time but not reliably.
- Neither bug touches typical development workflows — they only surface in loaded CI on specific test orderings or rate-limited IP pools.
Both are old defects, not regressions from any recent PR. PR Light-Heart-Labs#901 (`fix/opencode-json-sed`) exposed bug 2 purely because its CI run happened to land on a rate-limited runner.
Scope
This is out of scope for any current Batch A / Batch B PR. Filing as a standalone bug for a future cleanup.
Context
Two tests in `dream-server/extensions/services/dashboard-api/tests/test_updates.py` fail intermittently in CI. Observed failing once on upstream PR Light-Heart-Labs#901 (`fix/opencode-json-sed`, commit `b40378d2`) while passing on 19 other recent Dashboard workflow runs across main and other PRs.
CI log showed:
```
FAILED tests/test_updates.py::test_get_version_with_mock_github
AssertionError: assert None == '2.0.0'
FAILED tests/test_updates.py::test_get_releases_manifest_authenticated
AttributeError: 'str' object has no attribute 'get'
The router
`routers/updates.py:109-158` — the `/api/version` endpoint does NOT use `urllib.request`. It uses `httpx.AsyncClient`:
So `patch("urllib.request.urlopen", ...)` does nothing. The mock never intercepts anything.
Why the test sometimes passes
`_version_cache` at module level (`routers/updates.py:27`) is shared across all tests in the same pytest process. Other tests (e.g. `test_releases_manifest_with_mocked_github` at L78) correctly patch `routers.updates.httpx.AsyncClient` and mutate the global cache as a side effect. Depending on pytest test ordering, `_version_cache` may already contain `payload = {"latest": "2.0.0", ...}` by the time this test runs, and `_get_cached_release_payload()` returns it — bypassing the network call entirely. The test then "passes" by accident.
Why it fails on PR Light-Heart-Labs#901
If this test happens to run early in the suite (before `_version_cache` is populated), the router takes the fallback path at L153-158:
The 1.25-second timeout fires before the (real) httpx call to GitHub completes in a loaded CI runner, `_build_version_result` is called with `None`, and returns `{"latest": None, ...}`. Assertion `data["latest"] == "2.0.0"` becomes `assert None == '2.0.0'` — the exact error seen in CI.
Root-cause fix
Bug 2 — `test_get_releases_manifest_authenticated` makes real network calls + router does not validate response shape
The test
`tests/test_updates.py:47`
There are no mocks. The test calls the endpoint and expects it to work. The endpoint hits GitHub:
The router
`routers/updates.py:161-184`
The failure mode
GitHub's unauthenticated API allows 60 requests/hour per source IP. When shared CI runners (GitHub Actions with ephemeral public IPs) exceed that limit — which happens regularly on busy days — GitHub returns:
{ \"message\": \"API rate limit exceeded for ...\", \"documentation_url\": \"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting\" }`resp.json()` returns a dict, not a list. The `except` clause on L178 catches `HTTPError`, `TimeoutException`, `JSONDecodeError`, `OSError` — but not the case where the response is valid JSON but the wrong shape. The list comprehension `[r.get("tag_name", ...) for r in releases]` then iterates the dict's keys (strings like `"message"`), and calling `.get("tag_name", "")` on a string produces the exact error from CI:
Root-cause fix
Why this wasn't caught before
Both are old defects, not regressions from any recent PR. PR Light-Heart-Labs#901 (`fix/opencode-json-sed`) exposed bug 2 purely because its CI run happened to land on a rate-limited runner.
Scope
This is out of scope for any current Batch A / Batch B PR. Filing as a standalone bug for a future cleanup.