Skip to content

Commit 11563cb

Browse files
Patch76claude
andauthored
test(e2e): generalize readiness-gate diagnostics helper (closes #1267) (#1271)
* test(e2e): generalize readiness-gate diagnostics helper (closes #1267) Rename _dump_ha_mcp_tools_diagnostics → _dump_ha_readiness_diagnostics and add optional service_domain + config_entry_domain kwargs. Call from all five readiness-gate failure/warning branches in ha_container_with_fresh_config so the CI artifact carries HA-side context for any gate timeout, not just ha_mcp_tools. Sites added (label): * _wait_for_ha_api_ready fail → api-not-ready * STABILIZATION_TIMEOUT pytest.fail → stabilization-timeout * ENTITY_STABILIZATION_TIMEOUT pytest.fail → entity-registration-timeout * INPUT_BOOLEAN_WAIT logger.warning → input-boolean-warn * SUN_WAIT logger.warning → sun-wait-warn Existing ha_mcp_tools site updated to pass service_domain + config_entry_domain so its domain-specific presence-check output is preserved (label: ha-mcp-tools-timeout). Without service_domain/config_entry_domain, the dump emits aggregate /api/services domain list + config_entries total — enough context to distinguish "HA never finished starting" from "HA started but a specific domain regressed". Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): adopt Gemini review feedback for readiness diagnostics * Config_entries dump emits domain list in both branches (G2 + G1 docstring follow-up). * input_boolean gate passes service_domain="input_boolean" (G3, partial G4). * sun gate passes config_entry_domain="sun" (partial G5). Declined: G4 config_entry_domain="input_boolean" — input_boolean is a HA built-in helper without a config entry; passing it would emit misleading "NO entry visible" output. G4 + G5 label renames to *-debug — the label is a call-site descriptor, not a log-level directive; #1270's styleguide rule (debug-level inside polling try/except) applies to expected transients mid-poll, not to post-timeout diagnostic dumps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(tests): correct config_entries endpoint path in docstring + log messages The docstring and 3 log messages in _dump_ha_readiness_diagnostics referenced /api/config/config_entries while the actual GET targets the more specific /api/config/config_entries/entry endpoint. Inconsistency surfaced by Gemini on the #1271 re-review. Sibling sweep added one more reference in the ha_mcp_tools call-site comment block that had the same shorthand. No behaviour change — cosmetic / accuracy only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 2137cb3 commit 11563cb

1 file changed

Lines changed: 96 additions & 42 deletions

File tree

tests/src/e2e/conftest.py

Lines changed: 96 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -543,29 +543,39 @@ def _wait_for_ha_mcp_tools_services(
543543
return False
544544

545545

546-
def _dump_ha_mcp_tools_diagnostics(
546+
def _dump_ha_readiness_diagnostics(
547547
container: DockerContainer,
548548
base_url: str,
549549
headers: dict[str, str],
550550
label: str,
551+
*,
552+
service_domain: str | None = None,
553+
config_entry_domain: str | None = None,
551554
) -> None:
552-
"""Emit HA-side diagnostics for the ``ha_mcp_tools`` readiness gap.
553-
554-
Called from the ``HA_MCP_TOOLS_WAIT`` timeout branch before each retry and
555-
before the final ``pytest.fail`` so the CI artifact carries enough context
556-
to disambiguate the root-cause class (import error / dependency-setup
557-
delay / late event-driven registration / resource starvation) without
558-
requiring a local reproduction of the flake.
559-
560-
Best-effort: each capture is wrapped in its own try/except so a single
561-
failure (container already exited, HA API gone) does not lose the other
562-
captures. Surfaced at WARNING level so CI logs keep the lines visible
563-
even with default filtering.
555+
"""Emit HA-side diagnostics for any readiness-gate failure.
556+
557+
Generic best-effort dump used by every readiness gate in
558+
``ha_container_with_fresh_config``. The optional ``service_domain``
559+
and ``config_entry_domain`` arguments let the caller surface
560+
domain-specific presence/absence information (e.g.
561+
``service_domain="ha_mcp_tools"`` makes the dump call out whether
562+
that specific domain is missing from ``/api/services``) instead of
563+
only the generic counts.
564+
565+
Without those arguments, the dump shows aggregate ``/api/services``
566+
and ``/api/config/config_entries/entry`` domain lists — enough
567+
context to distinguish "HA never finished starting" from "HA started
568+
but a specific domain regressed".
569+
570+
Each capture is wrapped in its own try/except so a single failure
571+
(container already exited, HA API gone) does not lose the other
572+
captures. Surfaced at WARNING level so CI logs keep the lines
573+
visible even with default filtering.
564574
"""
565575
import docker as _docker
566576
import requests as _requests
567577

568-
logger.warning(f"📋 ha_mcp_tools diagnostics dump ({label}):")
578+
logger.warning(f"📋 readiness diagnostics dump ({label}):")
569579

570580
# /api/services snapshot — distinguishes "domain absent" from
571581
# "request errored at timeout edge".
@@ -575,15 +585,18 @@ def _dump_ha_mcp_tools_diagnostics(
575585
domains = sorted(
576586
{s.get("domain") for s in svc_resp.json() if s.get("domain")}
577587
)
578-
ha_mcp_present = "ha_mcp_tools" in domains
579-
logger.warning(
580-
f" /api/services: {len(domains)} domains; "
581-
f"ha_mcp_tools={'present' if ha_mcp_present else 'absent'}"
582-
)
583-
if not ha_mcp_present:
584-
# Surface adjacent domains so a reader can rule out a
585-
# regex / casing / typo class mismatch.
586-
logger.warning(f" /api/services domains: {domains}")
588+
if service_domain:
589+
present = service_domain in domains
590+
logger.warning(
591+
f" /api/services: {len(domains)} domains; "
592+
f"{service_domain}={'present' if present else 'absent'}"
593+
)
594+
if not present:
595+
# Surface adjacent domains so a reader can rule out a
596+
# regex / casing / typo class mismatch.
597+
logger.warning(f" /api/services domains: {domains}")
598+
else:
599+
logger.warning(f" /api/services: {len(domains)} domains: {domains}")
587600
else:
588601
logger.warning(
589602
f" /api/services: HTTP {svc_resp.status_code} {svc_resp.text[:200]}"
@@ -607,36 +620,49 @@ def _dump_ha_mcp_tools_diagnostics(
607620
headers=headers,
608621
)
609622
if entries_resp.status_code == 200:
610-
ha_mcp_entries = [
611-
e for e in entries_resp.json() if e.get("domain") == "ha_mcp_tools"
612-
]
613-
if ha_mcp_entries:
614-
for entry in ha_mcp_entries:
623+
entries = entries_resp.json()
624+
entry_domains = sorted(
625+
{e.get("domain") for e in entries if e.get("domain")}
626+
)
627+
if config_entry_domain:
628+
matching = [
629+
e for e in entries if e.get("domain") == config_entry_domain
630+
]
631+
if matching:
632+
for entry in matching:
633+
logger.warning(
634+
f" config_entry[{config_entry_domain}]: "
635+
f"id={entry.get('entry_id')} "
636+
f"state={entry.get('state')} "
637+
f"reason={entry.get('reason')} "
638+
f"source={entry.get('source')}"
639+
)
640+
else:
615641
logger.warning(
616-
f" config_entry: id={entry.get('entry_id')} "
617-
f"state={entry.get('state')} "
618-
f"reason={entry.get('reason')} "
619-
f"source={entry.get('source')}"
642+
f" config_entry[{config_entry_domain}]: NO entry "
643+
f"visible in HA's config_entries (available: "
644+
f"{entry_domains}) — install step may have written "
645+
"to .storage but HA did not pick it up"
620646
)
621647
else:
622648
logger.warning(
623-
" config_entry: NO ha_mcp_tools entry visible in HA's "
624-
"config_entries — install step wrote to .storage but HA "
625-
"did not pick it up"
649+
f" /api/config/config_entries/entry: {len(entries)} total: "
650+
f"{entry_domains}"
626651
)
627652
else:
628653
logger.warning(
629-
f" /api/config/config_entries: HTTP {entries_resp.status_code}"
654+
f" /api/config/config_entries/entry: HTTP {entries_resp.status_code}"
630655
)
631656
except Exception as exc:
632657
# Same broad-catch rationale as the /api/services dump above.
633658
logger.warning(
634-
f" /api/config/config_entries: request failed: {type(exc).__name__}: {exc}"
659+
f" /api/config/config_entries/entry: request failed: {type(exc).__name__}: {exc}"
635660
)
636661

637662
# docker logs --tail 100 + container state. The early ``tail=20`` grab
638-
# at line ~625 fires immediately after container start and so does not
639-
# cover the custom-component lifecycle that produces the symptom.
663+
# inside ``ha_container_with_fresh_config`` fires immediately after
664+
# container start and so does not cover the custom-component lifecycle
665+
# that produces the symptom.
640666
try:
641667
docker_client = _docker.from_env()
642668
docker_container = docker_client.containers.get(
@@ -966,6 +992,9 @@ def ha_container_with_fresh_config(_blueprint_http_server):
966992

967993
logger.info("🔄 Waiting for Home Assistant API to become ready...")
968994
if not _wait_for_ha_api_ready(base_url, headers, timeout=60):
995+
_dump_ha_readiness_diagnostics(
996+
container, base_url, headers, label="api-not-ready"
997+
)
969998
pytest.fail(
970999
f"Home Assistant API at {base_url} did not become ready within 60 seconds.\n"
9711000
"The container may have failed to start. Check Docker logs for details."
@@ -1010,6 +1039,9 @@ def ha_container_with_fresh_config(_blueprint_http_server):
10101039
logger.debug(f"Stabilization check failed: {exc}")
10111040
time.sleep(1)
10121041
else:
1042+
_dump_ha_readiness_diagnostics(
1043+
container, base_url, headers, label="stabilization-timeout"
1044+
)
10131045
pytest.fail(
10141046
f"Home Assistant component stabilization timed out after {STABILIZATION_TIMEOUT}s. "
10151047
f"Only {last_count} components loaded (minimum: {MIN_COMPONENTS}). "
@@ -1053,6 +1085,9 @@ def ha_container_with_fresh_config(_blueprint_http_server):
10531085
logger.debug(f"Entity registration check failed: {exc}")
10541086
time.sleep(1)
10551087
else:
1088+
_dump_ha_readiness_diagnostics(
1089+
container, base_url, headers, label="entity-registration-timeout"
1090+
)
10561091
pytest.fail(
10571092
f"Entity registration timed out after "
10581093
f"{ENTITY_STABILIZATION_TIMEOUT}s. "
@@ -1085,6 +1120,13 @@ def ha_container_with_fresh_config(_blueprint_http_server):
10851120
logger.debug(f"Service check failed: {exc}")
10861121
time.sleep(1)
10871122
else:
1123+
_dump_ha_readiness_diagnostics(
1124+
container,
1125+
base_url,
1126+
headers,
1127+
label="input-boolean-warn",
1128+
service_domain="input_boolean",
1129+
)
10881130
logger.warning(
10891131
f"⚠️ input_boolean service not registered after {INPUT_BOOLEAN_WAIT}s "
10901132
f"— helper/automation tests may be flaky"
@@ -1098,7 +1140,7 @@ def ha_container_with_fresh_config(_blueprint_http_server):
10981140
# test.
10991141
#
11001142
# On timeout the branch dumps HA-side diagnostics (``/api/services``
1101-
# snapshot, ``/api/config/config_entries`` entry state, ``docker
1143+
# snapshot, ``/api/config/config_entries/entry`` state, ``docker
11021144
# logs --tail 100``, container state) and then fails fast. The
11031145
# container-restart retry path that originally sat here added a
11041146
# ~3-minute slow-failure penalty (matching the second readiness
@@ -1116,8 +1158,13 @@ def ha_container_with_fresh_config(_blueprint_http_server):
11161158
if not _wait_for_ha_mcp_tools_services(
11171159
base_url, headers, HA_MCP_TOOLS_WAIT
11181160
):
1119-
_dump_ha_mcp_tools_diagnostics(
1120-
container, base_url, headers, label="timeout"
1161+
_dump_ha_readiness_diagnostics(
1162+
container,
1163+
base_url,
1164+
headers,
1165+
label="ha-mcp-tools-timeout",
1166+
service_domain="ha_mcp_tools",
1167+
config_entry_domain="ha_mcp_tools",
11211168
)
11221169
pytest.fail(
11231170
f"ha_mcp_tools services not registered after "
@@ -1150,6 +1197,13 @@ def ha_container_with_fresh_config(_blueprint_http_server):
11501197
logger.debug(f"sun.sun check failed: {exc}")
11511198
time.sleep(1)
11521199
else:
1200+
_dump_ha_readiness_diagnostics(
1201+
container,
1202+
base_url,
1203+
headers,
1204+
label="sun-wait-warn",
1205+
config_entry_domain="sun",
1206+
)
11531207
logger.warning(
11541208
f"⚠️ sun.sun still 'unknown' after {SUN_WAIT}s — template tests may fail"
11551209
)

0 commit comments

Comments
 (0)