Skip to content

flaky: test_updates.py — test_get_version_with_mock_github mocks wrong library; test_get_releases_manifest_authenticated hits live GitHub API unmocked #326

@yasinBursali

Description

@yasinBursali

Context

Two tests in `dream-server/extensions/services/dashboard-api/tests/test_updates.py` fail intermittently in CI. Observed failing once on upstream PR Light-Heart-Labs#901 (`fix/opencode-json-sed`, commit `b40378d2`) while passing on 19 other recent Dashboard workflow runs across main and other PRs.

CI log showed:
```
FAILED tests/test_updates.py::test_get_version_with_mock_github
AssertionError: assert None == '2.0.0'
FAILED tests/test_updates.py::test_get_releases_manifest_authenticated
AttributeError: 'str' object has no attribute 'get'


Both failures are **real bugs**, not noise. They are latent and only surface when the test environment lands in a particular state (test order, GitHub rate limiting, or an empty module-level cache).

---

## Bug 1 — \`test_get_version_with_mock_github\` mocks the wrong library

### The test
\`tests/test_updates.py:26\`

```python
def test_get_version_with_mock_github(test_client):
    mock_response = MagicMock()
    mock_response.read.return_value = b'{\"tag_name\": \"v2.0.0\", \"html_url\": \"https://github.com/test\"}'
    mock_response.__enter__ = lambda self: self
    mock_response.__exit__ = lambda self, *args: None

    with patch(\"urllib.request.urlopen\", return_value=mock_response):
        resp = test_client.get(\"/api/version\", headers=test_client.auth_headers)
        assert resp.status_code == 200
        data = resp.json()
        assert data[\"latest\"] == \"2.0.0\"

The router

`routers/updates.py:109-158` — the `/api/version` endpoint does NOT use `urllib.request`. It uses `httpx.AsyncClient`:

async def _refresh_release_cache() -> Optional[dict]:
    global _version_cache
    try:
        async with httpx.AsyncClient(timeout=5.0) as client:
            response = await client.get(
                \"https://api.github.com/repos/Light-Heart-Labs/DreamServer/releases/latest\",
                headers=_GITHUB_HEADERS,
            )
        data = response.json()
        ...

So `patch("urllib.request.urlopen", ...)` does nothing. The mock never intercepts anything.

Why the test sometimes passes

`_version_cache` at module level (`routers/updates.py:27`) is shared across all tests in the same pytest process. Other tests (e.g. `test_releases_manifest_with_mocked_github` at L78) correctly patch `routers.updates.httpx.AsyncClient` and mutate the global cache as a side effect. Depending on pytest test ordering, `_version_cache` may already contain `payload = {"latest": "2.0.0", ...}` by the time this test runs, and `_get_cached_release_payload()` returns it — bypassing the network call entirely. The test then "passes" by accident.

Why it fails on PR Light-Heart-Labs#901

If this test happens to run early in the suite (before `_version_cache` is populated), the router takes the fallback path at L153-158:

try:
    payload = await asyncio.wait_for(asyncio.shield(refresh_task), timeout=1.25)
    return _build_version_result(current, payload)
except asyncio.TimeoutError:
    return _build_version_result(current, None)

The 1.25-second timeout fires before the (real) httpx call to GitHub completes in a loaded CI runner, `_build_version_result` is called with `None`, and returns `{"latest": None, ...}`. Assertion `data["latest"] == "2.0.0"` becomes `assert None == '2.0.0'` — the exact error seen in CI.

Root-cause fix

  1. Change the patch target to the actual library: `with patch("routers.updates.httpx.AsyncClient", return_value=mock_async_client):` with a properly-constructed `AsyncMock` (see `test_releases_manifest_with_mocked_github` at L99-111 for the correct pattern).
  2. Add a pytest fixture that resets `routers.updates._version_cache` and `routers.updates._version_refresh_task` before every test in this module:
    @pytest.fixture(autouse=True)
    def reset_version_cache():
        import routers.updates as u
        u._version_cache = {\"expires_at\": 0.0, \"payload\": None}
        if u._version_refresh_task is not None and not u._version_refresh_task.done():
            u._version_refresh_task.cancel()
        u._version_refresh_task = None
        yield
  3. Optionally make `_build_version_result` treat `None` payload as 500 (instead of silently returning a half-populated dict), so the failure mode is explicit rather than silent.

Bug 2 — `test_get_releases_manifest_authenticated` makes real network calls + router does not validate response shape

The test

`tests/test_updates.py:47`

def test_get_releases_manifest_authenticated(test_client):
    resp = test_client.get(\"/api/releases/manifest\", headers=test_client.auth_headers)
    assert resp.status_code == 200
    data = resp.json()
    assert \"releases\" in data
    assert \"checked_at\" in data
    assert isinstance(data[\"releases\"], list)

There are no mocks. The test calls the endpoint and expects it to work. The endpoint hits GitHub:

The router

`routers/updates.py:161-184`

@router.get(\"/api/releases/manifest\", dependencies=[Depends(verify_api_key)])
async def get_release_manifest():
    try:
        async with httpx.AsyncClient(timeout=5.0) as client:
            resp = await client.get(
                \"https://api.github.com/repos/Light-Heart-Labs/DreamServer/releases?per_page=5\",
                headers=_GITHUB_HEADERS,
            )
        releases = resp.json()
        return {
            \"releases\": [
                {\"version\": r.get(\"tag_name\", \"\").lstrip(\"v\"), ...}
                for r in releases
            ],
            \"checked_at\": datetime.now(timezone.utc).isoformat() + \"Z\"
        }
    except (httpx.HTTPError, httpx.TimeoutException, json.JSONDecodeError, OSError):
        ...

The failure mode

GitHub's unauthenticated API allows 60 requests/hour per source IP. When shared CI runners (GitHub Actions with ephemeral public IPs) exceed that limit — which happens regularly on busy days — GitHub returns:

{
  \"message\": \"API rate limit exceeded for ...\",
  \"documentation_url\": \"https://docs.github.com/rest/overview/resources-in-the-rest-api#rate-limiting\"
}

`resp.json()` returns a dict, not a list. The `except` clause on L178 catches `HTTPError`, `TimeoutException`, `JSONDecodeError`, `OSError` — but not the case where the response is valid JSON but the wrong shape. The list comprehension `[r.get("tag_name", ...) for r in releases]` then iterates the dict's keys (strings like `"message"`), and calling `.get("tag_name", "")` on a string produces the exact error from CI:

AttributeError: 'str' object has no attribute 'get'

Root-cause fix

  1. Mock the test. Either redirect `test_get_releases_manifest_authenticated` to use the same `patch("routers.updates.httpx.AsyncClient", ...)` pattern that `test_releases_manifest_with_mocked_github` (L78) already uses, or delete this test since the mocked version already covers the success path. No Dream Server test should hit the live GitHub API.
  2. Harden the router. In `get_release_manifest`, validate `releases` is a list before iterating:
    releases = resp.json()
    if not isinstance(releases, list):
        # GitHub rate-limit dict, error blob, or any unexpected shape
        raise ValueError(f\"Unexpected GitHub response shape: {type(releases).__name__}\")
    The existing `except` block already catches `ValueError` (line 178 includes `json.JSONDecodeError` which is a `ValueError` subclass — adding a bare `ValueError` widens it), so this falls through cleanly to the local-version fallback.
  3. Optional: pipe a `GITHUB_TOKEN` through `_GITHUB_HEADERS` in CI runs so the 60/hr limit becomes 5000/hr and the symptom disappears even if both root causes stay latent. Workflow already has `secrets.GITHUB_TOKEN`; just add `if token: headers["Authorization"] = f"Bearer {token}"` at the router top.

Why this wasn't caught before

  • Bug 1 silently passes when `_version_cache` is pre-warmed by an earlier test in the module. Most pytest runs happen to order tests such that this is the case.
  • Bug 2 passes whenever the CI runner's IP hasn't hit the 60/hr GitHub rate limit for the past hour, which is most of the time but not reliably.
  • Neither bug touches typical development workflows — they only surface in loaded CI on specific test orderings or rate-limited IP pools.

Both are old defects, not regressions from any recent PR. PR Light-Heart-Labs#901 (`fix/opencode-json-sed`) exposed bug 2 purely because its CI run happened to land on a rate-limited runner.

Scope

This is out of scope for any current Batch A / Batch B PR. Filing as a standalone bug for a future cleanup.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions