feat: add FastMCP 3.1 Code Mode transform#11
Conversation
Code Mode replaces all 80+ tools with meta-tools (search, get_schema, execute) so the LLM discovers tools on demand and chains calls in a Python sandbox — reducing context bloat and round-trip overhead. Controlled via two new env vars (both default to false): - ENABLE_CODE_MODE: enables the 3-stage flow (search → get_schema → execute) - ENABLE_CODE_MODE_LIST_TOOLS: adds ListTools for full catalog discovery Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces an experimental "Code Mode" feature from FastMCP 3.1, designed to optimize how Large Language Models interact with available tools. By transforming a large set of individual tools into a few meta-tools, it aims to significantly reduce context bloat and enable more dynamic, chained tool execution within a Python sandbox, improving efficiency and flexibility for LLM-driven applications. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds an experimental 'Code Mode' feature, which is a significant architectural change. The implementation is clean with opt-in configuration and graceful error handling. However, this change modifies core functionality in src/ha_mcp/server.py without adding corresponding automated tests, which is a high-severity issue according to the repository's style guide (line 26). Automated tests are essential to prevent regressions for such a critical feature. I've also added a couple of medium-severity comments to improve code style and clarity.
| return | ||
|
|
||
| # Build discovery tools list | ||
| discovery_tools: list = [Search(), GetSchemas()] |
There was a problem hiding this comment.
The type hint list is too generic. For better type safety and code clarity, please use a more specific type hint like list[Any]. This makes it clearer that the list is expected to contain tool instances.
| discovery_tools: list = [Search(), GetSchemas()] | |
| discovery_tools: list[Any] = [Search(), GetSchemas()] |
| logger.info( | ||
| "Code Mode enabled (discovery: %s + execute)", | ||
| ", ".join(tool_names), | ||
| ) |
There was a problem hiding this comment.
This logging call uses C-style % formatting. The project's linting configuration uses ruff with the LOG rules, which favor f-strings for logging for improved readability and type safety. Please update this to use an f-string.
logger.info(
f"Code Mode enabled (discovery: {', '.join(tool_names)} + execute)"
)Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FastMCP 3.1.0's MontySandboxProvider passes external_functions to the Monty() constructor which doesn't accept it (fixed on FastMCP main but not released). Added _PatchedMontySandboxProvider with the fix from upstream main branch. Also manually updated uv.lock to include pydantic-monty (code-mode extra). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move [package.optional-dependencies] after sdist/wheels — TOML table headers must come after inline keys in the same package block. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Manual uv.lock edits don't match uv's internal format expectations. Instead, revert to upstream uv.lock and remove --locked from Dockerfile so uv resolves the code-mode extra (pydantic-monty) at build time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Search returns "detailed" results (param names/types inline) so the LLM can skip GetSchemas in most cases (2-stage instead of 3) - Limit search to top 10 results (was unlimited, returned 31+) - Add 30s execution timeout to sandbox Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep search at "brief" (default) — names + one-line descriptions only. LLM calls GetSchemas for just the tools it needs rather than getting full param info for all 10 results upfront. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Apply search keyword boosts and description overrides from PR homeassistant-ai#727 before CodeMode indexes tools. Narrows ha_deep_search description so it only ranks for YAML config searches, and boosts common tools like ha_search_entities and ha_config_get_automation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…homeassistant-ai#728, #11, dev45 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d findings (homeassistant-ai#1164) Round-3 review pass landed 13 verified findings; the cosmetic "persisted-value visible in UI after F5" item was explicitly skipped per user direction (UI shows post-gate value when master off; user accepts this since beta tools are actually disabled at runtime — the preserve-across-master-cycle UX is at the data layer, not the visual). Source fixes: - **Save button copy is now source-blind** — the previous "your feature-flag toggles already saved on click" claim only held when ``saveFeatureFlag`` raised ``restartNotice``; tool-config pin saves, backup-config saves, and cross-tab ``restart-required`` broadcasts also raise it. New copy: "a restart is pending. Click Restart above to apply your prior changes." (#2) - **Stale F.37 test docstring** describing the deleted server-side cascade rewritten. (#3) - **Beta-gate INFO log noise** — cascade-clear removal meant the gate could fire its "forcing %s=False" line every Settings rebuild, spamming addon logs once a user had truthy sub-flags persisted. Dedup via ``_BETA_GATE_LOGGED`` set per process, cleared on ``_reset_global_settings``. (#9) - **Lazy-lock docstring** updated to reflect Python 3.13 semantics (``asyncio.Lock()`` no longer takes a loop arg; the lazy pattern still serves test fixtures and single-loop deployment, with the invariant documented). (#10) - **Addon-mode carve-out comment** clarified to distinguish dev (master in schema) from stable (master web-UI-only). (homeassistant-ai#14) - **probe-div null branch** in F.37 now writes ``data-error`` so a failing test points at "selector missed" vs "value flipped" unambiguously. (homeassistant-ai#17) Tests added: - ``test_translations_cover_every_schema_key`` — parity check that every ``schema:`` key has a non-empty translation ``name`` and ``description``. Parameterised across stable + dev addons. Pins the class of silent gap that this PR's ``advanced_debug_logging`` fix addressed. (#4) - ``test_save_features_acquires_override_file_lock`` + ``test_save_advanced_acquires_override_file_lock`` — counting-lock wrapper asserts ``async with _get_override_file_lock()`` runs exactly once in each file-mode write path. Pin against a regression that silently bypasses concurrent-save serialisation. (#5) - ``test_dual_save_buttons_mirror_disabled_and_status_on_post_failure`` — exercises the 500-response branch of the dual-save mirror so a regression that broke ``_setAdvSaveStatus``/``_setAdvSaveDisabled`` for error paths only would still fail. (#6) - ``test_save_features_master_on_restores_subflag_values_in_addon_mode`` — addon-mode round-trip mirror of the existing standalone restore test; asserts the Supervisor merge-and-post call carries only the master flip-on and never zeroes out sub-flag values. (#7) - ``test_save_button_nothing_to_save_when_no_dirty_and_no_restart`` + ``test_save_button_restart_pending_hint_when_dirty_empty_but_restart_showing`` — both branches of the empty-dirty Save click are exercised; the restart-pending branch asserts the copy is source-blind. (#8) Deferred per user direction: - #1 (file-vs-Settings visual after F5): user accepts the current behavior (UI shows post-gate value; runtime tools actually disabled when master off; data-layer preserve still works end-to-end). - #11/#12/homeassistant-ai#13 (code-simplifier helper extractions): skipped as complicated to implement without behavior risk. - homeassistant-ai#15 (pre-homeassistant-ai#1164 users with already-cleared sub-flags): release-note concern, not a code change. - homeassistant-ai#16 (lock fragility under future thread-pool dispatch): speculative future-risk; not actionable today.
…ant-ai#1164) (homeassistant-ai#1431) * feat(policy): scaffold policy package for per-tool approval (#966) * feat(policy): add Predicate/Rule/Policy data models (#966) * feat(addon): expose enable_per_tool_approval option (#966) * feat(config): add enable_per_tool_approval setting (#966) * feat(policy): atomic load/save for tool_policy.json (#966) * feat(addon): wire enable_per_tool_approval through start.py + docs (#966) * feat(policy): args-hash + remember-cache for approval queue (#966) * feat(policy): predicate evaluator (eq/in/regex/exists/...) (#966) * feat(policy): pending entries with TTL, decisions, and event signalling (#966) * feat(policy): PolicyMiddleware happy-path branches (#966) * test(policy): cover block/deny/timeout/recall/remember branches (#966) * feat(policy): /api/policy/* Starlette handlers (#966) * fix(policy): wrap ValidationError, scope contains op, regex doc, test bind (#966) * feat(policy): Policies tab in web UI + sidecar route wiring (#966) * feat(policy): register PolicyMiddleware on the FastMCP server (#966) * fix(policy): return 400 on malformed approve/deny bodies (#966) * feat(toolsearch): unpin yaml-edit and code-mode tools, gated by approval middleware (#966) * chore: ruff format + lint cleanup for policy package (#966) - ruff format reflow on PR-touched files (case statements split to two lines, function signatures, line continuations). - UP042: Verdict now inherits from StrEnum instead of (str, Enum). - E402: hoist `import anyio` to the top of test_approval_queue.py. - I001: sort imports in test_evaluator.py and test_model.py. No behavior change. * fix(policy): mypy narrowing for evaluator comparisons (#966) `Predicate.value` is `Any | None` and `extract_path` returns `Any`, so `val == pv`, `val > pv`, etc. inherit `Any` and trip the project's `warn_return_any` mypy setting on functions declared `-> bool`. Wrap the comparison branches in `bool(...)` to make the narrowing explicit. Also guard the `regex` branch with `isinstance(pv, str)` so `re.search` receives a definite `str` instead of `Any | None`; a non-string regex value now returns False instead of raising TypeError at evaluation time, which is the only sensible behavior for a malformed pattern. No change to any test's expected outcome. * docs: credit @L1AD and PolicyLayer for #966 inspiration * docs(addon): fix wrong YAML example in enable_per_tool_approval section (#966) * fix(policy): CI green + critical bugs from reviewer cycle (#966) - expires_in_seconds: use time-remaining not total TTL - middleware: fail-closed on corrupt policy load (was crashing all gated calls) - handlers: 500-with-corrupt-flag on get_config when policy invalid - approval URL: wire secret_prefix via lazy getattr so HTTP-standalone emits a usable absolute path - approve/deny: return bool, 409 on already-decided - Predicate: field validators for op/value compatibility (Gemini) - persistence: explicit UTF-8 encoding (Gemini) - middleware: reuse tools/helpers safe_progress - handlers comment: fix wrong "next call" claim - test_middleware: unwrap ExceptionGroup for pytest.raises (CI fix) - test_stdio_settings_sidecar: include new policy_* handler keys (CI fix) * refactor(policy): drop default_action + tighten Rule.tool_name (#966) - Policy schema simplified: no default_action field. System is always "allow unless a rule matches; rule = require approval". The previous default_action='require_approval' option was a bricked-config trap since rules can't grant allow-overrides. - Rule.tool_name now rejects empty string; wildcard '*' documented in the docstring. - Evaluator simplified to match. - Tests updated/added. * refactor(policy): encapsulate decision state + clean naming (#966) - PendingApproval.decide() encapsulates the decision/event coupling; property guards read-only access to decision. - __post_init__ validates expires_at > created_at. - ApprovalQueue docstring spells out single-process scope and restart-loses-tokens semantics. - Rename args_preview -> args throughout (it was always the full unmodified args; "preview" was misleading). - Remove on_policy_change dead parameter from build_policy_handlers. * fix(toolsearch): default-pinned tools should be user-unpinnable (#966) - Computed pinned set now respects tool_config.json — explicit "enabled" state removes from defaults. - Remove ha_restart / ha_reload_core from DEFAULT_PINNED_TOOLS (recovery actions, low frequency, low value in default LLM tool surface). - Add ha_manage_backup to MANDATORY_TOOLS (operational essential). - Server now declares _settings_secret_prefix on __init__ for pyright. * refactor(policy): rename feature to "Tool Security Policies" (#966) User-facing rename: addon config option, env var, Settings attribute, UI tab label, addon DOCS sections, translations, server method. Internal Python naming (policy/ package, tool_policy.json, /api/policy/* routes, class names like PolicyMiddleware/ApprovalQueue) unchanged for less churn. * docs: small comment polish for policy review nits (#966) * feat(ui): per-tool security-gated toggle in Tools tab (#966) * test(policy): integration + timing-isolation coverage gaps (#966) - test_server_policy_wiring: assert _apply_tool_security_policies attaches middleware + approval_queue when enabled, neither when disabled. - test_settings_ui_handler_selection: parametrize the live-vs-stub branch in build_settings_handlers (sidecar / no server / no queue / live). - test_middleware wait-loop timing: assert event-wake exit, not polling. - test_middleware multi-rule precedence: first-match wins for remember_minutes. * feat(ui): rewrite Tool Security Policies tab — per-tool cards + predicate editor (#966) * feat(config): expose enable_tool_security_policies as a feature flag (#966) Wires the new addon-config toggle into FEATURE_FLAG_FIELDS so it appears in the Server Settings tab and rides PR #1420's _save_feature_flags + _supervisor_merge_and_post_options + _schedule_supervisor_self_restart flow when toggled in addon mode. * fix(policy): address all verified review findings + CI failures (#966) CI fixes: - ruff: drop dead noqa: SLF001 suppression - unit test: test_missing_path_never_matches_except_exists uses op-compatible values so Predicate field_validator doesn't reject Review findings: - FEATURE_META entry for enable_tool_security_policies (toggle now renders in Server Settings tab) - Approval URL points to /settings?tab=tool-security-policies (was POST-only /api/policy/approve which 405'd on browser open) - policyDecide surfaces network errors + 409 current_decision - Policy gains version field for optimistic concurrency; PUT 409s on version mismatch; client surfaces 'reload before saving' - _apply_tool_security_policies failure logs spell out security impact (TOOL SECURITY GATING IS NOT ACTIVE) and include data_dir/env-var context - Validator rejects value on op='exists'; gt/lt TypeError degrades to False - ToolVisibilityResult -> UserToolStateOverrides, fields are frozenset, disjointness asserted - PendingApproval.event private; expose async wait() - _SupervisorOptionsError gains transport()/validation() classmethods encoding kind->status_code pairing - Wiring test binds queue identity; handler-selection covers all 3 live routes - Comment + doc polish (audit-trail claim, e2e docstring path, internal Task references) * fix(policy): CI green + real e2e test for the approval flow (#966) - ruff format: tests/src/unit/test_settings_ui.py - test_save_and_roundtrip: account for save_policy version bump - test_serialized_shape_is_stable: include 'version' in expected keys - test_addon_save_returns_500_when_server_is_none: guard server._settings_secret_prefix assignment with None check (regression from #4's secret-prefix wiring) - tests/src/e2e/policy/test_approval_flow.py: real e2e exercising block -> approve -> re-call cycle with strict args-binding rejection on mutated args. Skip-stub replaced with real test driving the live middleware via mcp_client + /api/policy/* HTTP. * fix(ui): broken quote escaping in predicate-form placeholder breaks JS parse (#966) The Python source `'placeholder=\\'\"lock\"...\\\\'>'` rendered as JS `'placeholder='\"lock\"..'>'` — the single quote inside the HTML attribute value closed the outer JS string literal, and subsequent tokens (\"lock\", 'or', '[', ...) broke parsing. With a syntax error in the inline <script>, the browser stopped executing — Tools tab stuck on 'Loading...', tabs unclickable. Switched to a JS-safe double-quoted attribute with " for the embedded double quotes in the placeholder hint. * fix(ui): gated toggle reads addon-config flag, not Policy.enabled (#966) The per-tool 'security gated' toggle was grayed out even when the user had enable_tool_security_policies turned ON in the addon config + the Server Settings tab toggle, because the JS was reading Policy.enabled (the file field) instead of the addon-config feature flag — which is the single source of truth for whether the middleware is active. loadPolicyState now reads enable_tool_security_policies from /api/settings/features (same place renderFeatureFlags consumes from). * fix(policy): Policy.extra=ignore so old persisted files load cleanly (#966) Persisted tool_policy.json files from an earlier revision of this PR carry default_action (since dropped) and rejected with ValidationError on load — surfacing as 'Could not load policy: 500' when the user clicked the per-tool gated toggle. Predicate/Rule keep extra=forbid (typo catching at construction). * fix(policy): drop Policy.enabled — addon-config flag is the sole switch (#966) The middleware's server-side gate was checking `policy.enabled` (a file field with no UI surface), so it returned ALLOW on every call regardless of rules. The addon-config flag (`enable_tool_security_policies`) was supposed to be the only switch — and the middleware is only registered when that flag is true — so the inner `policy.enabled` check was both redundant and broken. Remove the field, remove both server-side checks (middleware + evaluator), update tests, and refresh the JS comment that referred to it. * fix(policy): drop approve_url, instruct LLM to send user to settings page (#966) The relative-path approve_url doesn't resolve cleanly through cloudflared or other reverse-proxy deployment modes — the LLM can't safely hand it to the user. The user already knows where the Tool Security Policies tab is (they set the rule from it), and that page lists all pending approvals, so a per-request URL is unnecessary noise. - Drop approve_url from USER_APPROVAL_REQUIRED context; keep `token` so a caller could correlate but the user doesn't need to act on it. - Update message + progress text to instruct the LLM to tell the user to open the settings UI Tool Security Policies tab. - Drop the now-unused approval_url_builder param + the _settings_secret_prefix plumbing in server.py / settings_ui.py. Also fix the failing test_defaults (asserted dropped Policy.enabled field) and the e2e test PUT body that still carried `"enabled": True`. * feat(policy): schema-driven condition builder for write/destructive tools (#966) The previous "Add predicate" UX required users to type both the dotted arg path (e.g. `args.domain`) and the value as JSON. Two problems: 1. They need to know what fields each tool takes. 2. They need to know what values are legal (which HA domains exist, which entities, etc.). Replace the free-text path input with a dropdown sourced from the tool's JSON schema, and replace the free-text value input with a (multi-)select sourced from HA when the path has a known value source (domain, service, entity_id today; trivially extensible). Free-text is still available via an "(other — type a path)" escape hatch and as the automatic fallback for ops that don't pair with a registry (regex / contains / gt / lt). Server: - New `/api/policy/tool-schema?name=...` returns `{paths: [...], value_sources: {path: source_key}}`. Read-only tools return empty paths so the UI falls back to free-text (gating those is low-value but still permitted manually). - New `/api/policy/value-source?source=...` resolves a source key to a live list of choices. In-process 30s TTL cache avoids hammering HA when the user explores paths. - value_sources.py registry maps (tool_name, arg_path) → source_key for the common write/destructive surface (call_service, set_entity, set_integration_enabled, get_history, etc.). New mappings are one dict entry plus, if a new source key, one fetcher. - Both endpoints mount in addon + secret-prefix routes. Sidecar serves 503 stubs (no FastMCP registry / HA client in that process). UI: rename user-facing "predicate" → "condition" (CS jargon → SQL/JIRA terminology users actually recognise; internal Pydantic class stays `Predicate` so the wire format is unchanged). Form fetches the schema lazily on first open, caches it on the card, refetches value choices when path/op changes. Includes test_schema_handlers.py covering: missing-name 400, sidecar 503, unknown-tool 404, read-only empty-paths, write-tool paths + registry, JSON-schema enum passthrough, value-source 400 paths, both HA-services payload shapes, domain filtering for entities/services, and upstream-fetch 502 mapping. * test: include new policy handler keys in sidecar all-keys assertion (#966) * fix(ui): clearer condition-builder labels, optional value, bareword input (#966) User feedback on the new form was: 1. "args.foo" path placeholder is gibberish; no real label on path/value 2. value box should not be mandatory for ops where backend allows None 3. typing `lock` into the value box errored with "Invalid JSON" — every normal-looking input has to be quoted 4. for ha_call_service `data` was the only arg without an obvious meaning Changes: - Real `<label>`s on the form rows: "Argument:", "Match when:", "Value:". - Op dropdown shows friendly text ("is present (any value)", "equals", "is one of", "matches regex", etc.); wire values unchanged. - Hint line under the value row reflects the current op so users know whether a value is required and roughly what shape it should take. - Value is now OPTIONAL for ops where the backend accepts a missing field (exists, eq, neq, contains). Submitting an empty value omits the `value` key from the predicate entirely. - Bareword inputs auto-coerce: `lock` → `"lock"`, `lock,alarm` → list, `42` → number, `true` → bool. Falls back to a clearer error if even the smart-coercion can't make JSON. - Path dropdown options now carry the schema `description` as a `title` tooltip, so `data` reads as "Service data dict" on hover instead of being a mystery. - Schema-declared enums render as a value dropdown automatically (no registry entry needed) when the path's JSON-schema has `enum`. Also fix /tmp/extract_js.py — naive paren-counter broke once form strings started containing parens; switch to ast.parse so future edits don't silently break the harness. * feat(policy): wildcard path "args.*" + clearer empty-value semantics (#966) User asked for a catch-all "any argument equals X" condition and called out that the previous "Leave blank to gate on null" hint was nonsensical — a blank value should mean "any value" to a normal user, not "match the null literal". Backend: - Refactor evaluator: `extract_path` → `iter_path_values`, which yields every value the dotted path resolves to. A `*` segment fans out across the current node (dict values for dicts, items for lists). A path like `args.*` thus yields every top-level arg; `args.config.*` yields every leaf of the config sub-dict. - `match_predicate` rewrites to "ANY matching value satisfies the op", which collapses to the previous single-value semantics for non-wildcard paths. So `path=args.*, op=eq, value="lock"` gates whenever any arg of the tool call equals "lock". UI: - New "(any argument)" option at the top of the path dropdown, fills `args.*` and carries a tooltip explaining the semantic. - VALUE_OPTIONAL_OPS shrinks to just `exists` — blank value is no longer silently accepted for eq/neq/contains. Instead, the value-required error fires, and the hint text under the field tells the user to switch op to `is present` if they wanted "any value". - Hint copy revised across all ops so the "what happens with this op + blank value" question has a clear answer at each step. Tests: - New TestIterPathValues covering top-level, nested, missing, and the three wildcard shapes (dict values, list items, empty). - New TestWildcardPredicate covering eq/in/exists/regex matching via `args.*` plus an end-to-end evaluate() test. - Existing tests still pass with the refactored matcher; signatures of the public functions are unchanged. * fix(ui): default condition path to '(any argument)'; relabel error (#966) - Drop the '(pick an argument)' placeholder; default the path dropdown to '(any argument)' so the form is immediately submittable. - 'path is required' error reads 'argument is required' if it ever fires (it won't on the happy path now). * feat(policy): auto-save conditions + surface matched_rule in approval error (#966) UI: - Drop the manual "Save changes" button on each rule card. Conditions now PUT to disk the moment the user clicks "Save condition", clicks the × on a condition row, or edits the remember-minutes field (debounced 500ms). The only feedback is a small "Saving…" / "Saved." status line next to the card. - Removed the now-dead .policy-save-rule CSS and the markDirty helper. Server: - USER_APPROVAL_REQUIRED error context now carries `matched_rule` with the rule's tool_name + when[]. Lets the user (and the LLM) tell at a glance which rule fired, instead of guessing whether their condition saved correctly. * feat(policy): case-insensitive string comparison in all ops (#966) Security gates shouldn't fire differently based on whether the LLM capitalised its argument — 'Lock' and 'LOCK' and 'lock' are the same operationally. eq/neq/in/not_in/contains lower-case both sides before comparing when both are strings; regex uses re.IGNORECASE. Non-string types pass through unchanged so int(1) != str('1') still holds. * fix(policy): mypy bool cast + broaden e2e coverage (#966) mypy: bool(_ci(val) == _ci(pv)) — _ci returns Any (passes non-strings through unchanged), so eq/neq comparisons need an explicit bool wrap. Tests: previous e2e only covered the happy block→approve→re-call path. Add four more cases against the live testcontainer: - wildcard `args.*` gates when any arg matches the value - wildcard `args.*` passes through when no arg matches - case-insensitive matching (rule 'lock' gates caller 'LOCK') - deny → middleware raises USER_DENIED, tool never runs - remember_minutes>0: second call within the window skips the queue * refactor(policy): address review-cycle findings (#966) Gemini (6 unresolved threads): - Migrate POLICY_LOAD_FAILED / USER_DENIED / USER_APPROVAL_REQUIRED off manual `ToolError(json.dumps(...))` onto the canonical `raise_tool_error(create_error_response(...))` pattern. Added the three error codes to ErrorCode enum. - Hoist sync `load_policy()` off the event loop via `anyio.to_thread.run_sync` in the middleware's policy provider call. - Add justification comment on `local_provider._list_tools()` (same rationale that's already documented in `settings_ui.py`'s tool enumerator: public `list_tools()` filters disabled tools but operators may still want to author gating rules for them). Code-reviewer findings: - ApprovalQueue TOCTOU: two concurrent `on_call_tool` coroutines with identical (tool, args_hash) could both miss `find()` and create duplicate pending entries; approving one would leave the other waiter blocked. Introduce `find_or_create(...)` serialised behind an `anyio.Lock`; middleware now uses it. - ApprovalQueue had no pending-entries cap → memory exhaustion under an LLM retry-loop with mutated args. Add `PENDING_CAP = 1000` with FIFO eviction of oldest entries when the cap is hit. Silent-failure-hunter findings: - handlers.py: `get_tool_schema` and `get_value_source` now `logger.exception` before returning 500/502 so FastMCP version bumps or HA outages leave a traceable signal instead of opaque client errors. - value_sources fetchers `logger.warning` on unexpected HA response shapes (would otherwise silently return empty dropdowns). - value_sources cache no longer stores empty results — a transient HA glitch returning [] would otherwise pin the dropdown blank for 30s. PR-test-analyzer findings (the critical one): - test_persistence.py's `test_save_and_roundtrip` passed `Policy(enabled=True, ...)` for a field that no longer exists; `extra="ignore"` silently dropped it so the test was a no-op assertion. Replace with real round-tripped fields (wait_seconds / approval_ttl_minutes / remember_minutes) and add an explicit `test_load_drops_unknown_fields` exercising the extra="ignore" back-compat contract with a JSON file carrying `default_action` + `enabled`. - Add wildcard scalar/None tests (`args.x.*` against scalar yields nothing; doesn't crash). - Add ApprovalQueue tests: concurrent `find_or_create` shares one pending entry; `create` evicts oldest at PENDING_CAP. - Add handler tests: sidecar value-source returns 503, tool-schema 500 on `_list_tools` exception, value-source cache key separates per params, `_extract_arg_paths` skips malformed property entries. Comment-analyzer findings: - Grammar fix in handlers.py `_is_write_or_destructive` docstring. - model.py docstring "older version of this PR" → "older builds". - Strip the `(#966)` / `(issue #966)` parentheticals from module docstrings, settings_ui CSS/HTML/comments — git blame and the commit message carry the link. * fix(policy): UI surface fetch failures + middleware reissues swept pending (#966) - Middleware: after _wait_for_decision returns without a verdict, check whether the pending entry was swept (TTL elapsed during the wait). If so, create a fresh entry before raising USER_APPROVAL_REQUIRED so the LLM isn't told to re-call against a dead token. - UI: policyLoadConfig now surfaces fetch failures in a visible error banner instead of silently rendering blank — picks up the policy_file_corrupt:true repair hint from the server's 500 response. - UI: loadValueChoices records the failure (lastValueSourceError) so renderHint can show it under the value row. The dropdown still downgrades to free-text, but the user can now tell a transient HA outage from "no value source registered for this path". - UI: renderValueControl uses an autoincrement seq so rapid path/op edits don't let an earlier slow fetch's DOM mutation land after a newer one's (similar to the autoSave pattern). * fix(policy): logger.info on silent decide-False; debug log on gt/lt type-mismatch; strengthen event-wake test (#966) - ApprovalQueue.approve/deny: emit logger.info when the call returns False (unknown token or already decided) — was silent. Helps debug the case where the middleware's consume_and_maybe_remember races with an out-of-band decide. - Evaluator gt/lt TypeError fallback now logs at debug so a user whose 'battery_level < 20' rule never fires can see that the arg came in as a string and tighten the rule. - test_event_wakes_waiter now measures elapsed wait time and asserts < 200ms, ruling out a hidden poll-loop impl that would still pass the previous decision-only check. * style: ruff format evaluator.py for CI's 0.15.13 (#966) Local ruff 0.15.7 didn't wrap the multi-arg logger.debug call; CI's ruff 0.15.13 does. Upgrading local toolchain to match. * feat(policy): clear remember-cache on save, clearer disabled-state UX, mirror master toggle (#966) Three things: 1. Remember-cache invalidation on policy save (B2). ApprovalQueue.clear_remember_cache() drops every remembered approval; put_config calls it after a successful save. Without this, tightening a rule was silently bypassed by any in-flight remembered approvals until their window expired. 2. Better 503 / "unavailable" messaging (B8 + the broader issue). The stub handler's 503 message used to read "Live approvals unavailable in this mode (sidecar)" even when the real cause was the feature being turned off in addon config — the user had no way to tell from the UI. Updated to call out all three causes (feature off, sidecar, ImportError) and point at the addon log. The pending-list JS now checks policyState.enabled first and shows "Tool Security Policies is turned off" when that's the actual reason, falling back to the server's 503 message otherwise. 3. Mirror the master toggle onto the Tool Security Policies tab. Was only exposed in Server Settings before — users on the Policies tab had to navigate away to find the on/off switch. New checkbox at the top of the tab posts to the same /api/settings/features endpoint, so the two surfaces are live mirrors of the same addon-config flag. * test(policy): fix JS-harness drift guard + lock policy-tab behaviour (#966) The merged-in JSDOM behaviour test (#1425) failed collection because its hardcoded _TOP_LEVEL_ELEMENT_IDS list didn't yet know about the policy-tab handlers this PR adds (policy-master-toggle, policy-save-global-btn). Add them, plus matching DOM stubs in _build_min_dom so the init pass doesn't throw on the addEventListener calls. While the file is open, add three behavioural tests that pin the new condition-builder UX wiring: - Master toggle change POSTs to /api/settings/features with the enable_tool_security_policies flag (so the on-tab toggle stays a true mirror of the Server-Settings checkbox). - /api/policy/pending 503 renders "Tool Security Policies is turned off" when the addon flag is off (avoids the old misleading "sidecar / unavailable" copy). - /api/policy/pending 503 propagates the server's addon-log message verbatim when the flag IS on but the queue is unreachable (so users know where to look for ImportError details). The parse-coverage path catches syntax breaks already; these tests catch behavioural regressions on top of it. * refactor(policy): address 2nd-round review findings (#966) Verified all 23 findings from the 2nd pr-review-toolkit pass against the code; fixed 22 (skipping #13 — the JSDOM seq-cancel race test is high-effort to author reliably and the production guard is small enough that bench-level review catches regressions). ## Correctness / silent-failure - ApprovalQueue PENDING_CAP eviction now sorts by `(decision == "pending", created_at)` so resolved entries evict first. When a still-pending entry MUST be evicted (cap full, no resolved to drop), `.set()` its event so any waiter in `_wait_for_decision` wakes immediately instead of blocking the full wait_seconds against a row that no longer exists. - Middleware: log INFO with old + new token on the reissue-after-sweep branch so operators can correlate "approval row keeps reappearing" with the actual cause. - Middleware: scope `clear_remember_cache` to "rules actually changed" — editing only wait_seconds / approval_ttl_minutes no longer blows away in-flight remembered approvals. - Policy: model_validator requires `wait_seconds < approval_ttl_minutes * 60` so the middleware can't repeatedly issue fresh pending entries because the wait outlasted the TTL. - value_sources: cache key uses `urllib.parse.urlencode` so a future param value containing `=`/`&` can't collide with another key. - ApprovalQueue: `approve`/`deny` on unknown token now logs WARNING (security-gating endpoint, suggests UI bug or token probing). Already-decided stays INFO (legitimate race). ## UI - settings_ui.policyState gains an `enabledKnown` tri-state bit so downstream branches (`policyLoadPending`'s "feature off" copy, master-toggle revert) don't false-confidently route to the "disabled" message when the features fetch actually failed. - Master-toggle change handler reverts the checkbox on save failure AND syncs from `policyState.enabled` after a successful load — no more "UI says on, server says off" drift. - `policyLoadConfig` appends "(response body unparseable)" when the 500 body isn't JSON (e.g. HTML error page from a misrouted sidecar), so the operator sees more than just "HTTP 500". - `policyLoadPending` surfaces fetch errors inline ("Lost contact with server, retrying") instead of silently freezing the list. - `fetchToolSchema` records `lastValueSourceError` on failure so the hint banner explains why the value dropdown silently downgraded to free text. ## Tests - test_handlers: `test_put_config_clears_remember_cache_when_rules_change` + `test_put_config_preserves_remember_cache_when_only_timing_changes` lock in the scoped invalidation. - test_approval_queue: `test_create_evicts_resolved_entries_before_pending`, `test_evicting_pending_wakes_its_waiter`, `test_create_after_sweep_still_evicts_when_pending_fills_cap` cover the new eviction rules. The strengthened `test_find_or_create_lock_blocks_concurrent_create_under_real_race` inserts a yield point inside the lock body so the lock actually matters to the assertion (the previous test would pass even without the lock under anyio's cooperative scheduler). - test_middleware: `test_swept_pending_during_wait_is_reissued_with_fresh_token` exercises the previously-untested reissue branch. - test_schema_handlers: `test_value_source_empty_result_not_cached` proves the empty-result no-cache guard actually triggers a refetch. - test_settings_ui_js_behavior: master-toggle test now JSON-parses the POST body and structurally asserts `flags.enable_tool_security_policies is True` instead of loose substring matching. ## Comments - approval_queue.py PENDING_CAP docstring now matches implementation (was promising "resolved first" before the implementation actually did it). - evaluator.py: gt-branch comment example uses ">" not "<". - handlers.py: dropped cross-reference to settings_ui that would rot. - middleware.py: trimmed "fast on warm disk" speculation. - value_sources.py: trimmed "(WebSocket reconnect, auth lapse)" speculation in the empty-cache comment. - settings_ui.py: four comments still saying "predicates" updated to "conditions" to match user-facing terminology. * fix(ui): blank value on eq/in/etc coerces to op=exists (#966) User expects 'leave value blank to gate on the argument's mere presence regardless of value' to work across the equality-ish ops, not just op=exists. Earlier I had the form reject blank value for anything other than exists with a 'value is required' error. Now: for eq / neq / in / not_in / contains / exists, leaving the value blank silently coerces the predicate to op=exists on save. The condition row then reads as 'args.* exists' which is the right description of what's stored. Ops that genuinely need a value (regex / gt / lt) still raise 'value is required for op=...'. The hint text under each op updated to call out 'Leave blank to gate on any value' where it applies. * docs(addon): drop beta tag from Tool Security Policies (#966) The feature is stable enough to ship as a default-supported addon config option, not a beta. Also corrects two pieces of doc drift that landed here originally: - 'approval URL' wording → 'tell the user to open the Tool Security Policies tab' (the URL field was dropped earlier in this PR) - 'predicates' → 'conditions' (matches the user-facing terminology the UI now uses) Touches both prod and dev addon directories (config.yaml-driven UI text + the rendered DOCS.md). * docs(beta): describe 3-path enabling (dev addon, stable + web UI, env vars) (#1164) * feat(config): advanced settings registry + beta master toggle field (#1164) - Add ``ADVANCED_SETTINGS_FIELDS`` registry (21 fields across connection, search, operations, diagnostics, tools_surface, beta_codemode sections) - Add ``_ADVANCED_SETTINGS_BOUNDS`` and ``_ADVANCED_SETTINGS_CHOICES`` dicts for UI/POST validation - Add ``BETA_FEATURE_FIELDS`` tuple for master-gate enforcement - Add ``enable_beta_features`` Settings field (alias ENABLE_BETA_FEATURES, default False) as the master beta toggle - Update ``FEATURE_FLAG_FIELDS``: add ``enable_beta_features`` at front, ``enable_code_mode`` at end; reorder for UI grouping - Extend ``BACKUP_OVERRIDE_FIELDS`` from 3 to 5 entries (add ``auto_backup_dir`` and ``auto_backup_calendar_lookahead_days``) - Extend ``_apply_backup_overrides`` to handle ``str`` type; widen ``coerced`` annotation to ``bool | int | str``; add bounds check for ``auto_backup_calendar_lookahead_days`` (1..365) - Add coverage gate test asserting every Settings env alias is registered in one of the three panel registries (or in the explicit ALLOWLIST) - Add ``test_enable_beta_features_default_false`` * refactor(config): code-review fixups — docstring clarity, stricter bool reject, null-byte guard (#1164) * feat(settings-ui): per-tool env-pin for DISABLED_TOOLS / PINNED_TOOLS (#1164 addendum) Add env_pinned_tools() and effective_tool_config() helpers so tools listed in DISABLED_TOOLS / PINNED_TOOLS env vars stay read-only at runtime even after tool_config.json has been written. The _get_tools GET handler now includes env_pinned metadata per tool entry; _save_tools rejects incoming flips of env-pinned tools with HTTP 409. Server.py startup path updated to use effective_tool_config() so env pins apply at boot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(config): master beta gate + advanced overrides apply (#1164) - Rewrite _apply_feature_flag_overrides: lift addon-mode short-circuit for beta fields (enable_beta_features + BETA_FEATURE_FIELDS), add master gate that forces all 5 beta sub-flags to False when master is off regardless of env/file state - Update get_feature_flag_origin: beta fields never return "addon" — they follow standalone precedence in either mode - Add _apply_advanced_overrides: reads feature_flags.json for all editable ADVANCED_SETTINGS_FIELDS entries; skips display-only fields; validates types, bounds, and choices before setattr - Wire _apply_advanced_overrides into get_global_settings (runs after feature-flag + backup passes) - Add 18 unit tests covering master gate semantics, addon-mode carve-out, advanced-override int/str/float/display-only/out-of-bounds/invalid- choice cases, and backward-compat (pre-master override files) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(settings-ui): code-review fixups — drop dup env_pinned, settings param, msg + tests (#1164) Address code-quality review feedback on 32d67b4d: - Drop the redundant per-tool-entry `env_pinned` field from the GET /api/settings/tools response. The top-level `env_pinned` map is the single source of truth; UI does O(1) lookups against it. - `effective_tool_config()` now accepts an optional `settings` parameter (mirrors `load_tool_config()`); restores dependency injection at the server.py startup callsite (`self.settings`). - 409 rejection message uses comma-joined names instead of Python list repr for better human readability; structured `context.rejected` remains for programmatic access. - Add `test_get_tools_includes_env_pinned_map` test covering the new GET response field. - Symmetrize `get_data_dir.cache_clear()` calls at both ends of each tmp_path test so cross-test cache pollution can't leak in. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor(config): code-review fixups — docstring accuracy, top-level import, null-byte test (#1164) - Correct _apply_advanced_overrides docstring: most advanced fields ARE in the addon config.yaml schema (backup_hint, verify_ssl, enabled_tool_modules, etc.) and are handled correctly via the env- var-wins check because start.py exports them. Only code_mode_* and mcp_server_version are file-only in either mode. - Move `from typing import Any` from function body to module-level imports (stdlib typing — no need to defer). - Add test_advanced_override_str_field_with_null_byte_rejected to cover the previously-unexercised null-byte reject branch. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * feat(settings-ui): render env-pinned tool rows as read-only + update addon translations (#1164) - Add `toolEnvPinned` module-level map; populate from `data.env_pinned` in `loadTools()` - Tool rows for env-pinned tools get `.env-pinned` class, all inputs disabled, and a `feature-locked-note` banner naming the env var (DISABLED_TOOLS or PINNED_TOOLS) - Group master toggle excludes env-pinned tools from bulk enable/disable - Update `pinNotice` copy to describe the env-pinned lock-until-unset semantics - Update `homeassistant-addon-dev/translations/en.yaml` descriptions for `disabled_tools` and `pinned_tools` to reflect that they are operator-level locks, not seed-only values - Add JSDOM behavioural tests for env-pinned disabled and pinned tool rows * feat(settings-ui): /api/settings/advanced GET+POST handlers (#1164) * feat(settings-ui): render advanced settings sections in Server Settings tab (#1164) * feat(settings-ui): beta master toggle + nested sub-row gating + 409 rejection (#1164) * feat(settings-ui): nest code-mode sub-numerics under enable_code_mode (#1164) * test(addon): assert start.py auto-enables master beta in dev addon mode + stable schema absence (#1164) * feat(addon): start.py auto-enables ENABLE_BETA_FEATURES=true when dev addon options carry beta keys (#1164) * fix(settings-ui): post-CI/post-review fixes — backup-config str+range, addon-mode gate skip, e2e env, JSDOM ids, stale comments (#1164) * fix(config): beta-sub-flag origin returns addon in dev mode (env var presence as signal) (#1164) * fix(tests): addon-mode save tests use non-beta flag matching new origin semantics (#1164) * refactor(settings-ui): loadAdvancedSettings error parity + atomic-write helper reuse + beta_sub_flags via API (#1164) * refactor(config): narrower setattr errors + hasattr precheck + Gemini coerced-decl nit (#1164) * refactor(settings-ui): hoist override-file read + display-only warning + missing test coverage (#1164) * test(settings-ui): section-uniqueness + registry-disjoint + tighter env-pin assertions + accurate docstring (#1164) * refactor(addon): extract maybe_auto_enable_beta_master helper + real unit tests (#1164) * test(settings-ui): JSDOM coverage for advanced sections + master live-render; verify file untouched on 400 (#1164) * refactor(config): NamedTuple registries + import-time validator; finish silent-failure + JSDOM nesting tests (#1164) * fix(tests): restore standalone-mode assertion + add adv section containers (#1164) `replace_all` from an earlier sweep had pivoted the standalone-mode save assertion onto a beta sub-flag, so the master-gate guard now rejected the request and the test failed. Restore the non-beta `enable_tool_search` flag here — the assertion is about the unified save-contract shape, not the beta path. JSDOM `TestAdvancedSectionRender` tests were failing because MIN_DOM lacked the five `adv*` section containers; `renderAdvancedSection` would silently no-op (getElementById returned null) and the assertions fired against an empty body. Add `advConnection`, `advSearch`, `advOperations`, `advToolsSurface`, `advDiagnostics` to `_TOP_LEVEL_ELEMENT_IDS` so `_build_min_dom` emits `<div>` containers for them. * fix(tests): adopt master beta gate + new advanced handler keys (#1164) Three failures surfaced after rebasing onto upstream/master: 1. ``test_returns_all_handler_keys`` expected the pre-#1164 handler set. Add ``get_advanced_settings`` / ``save_advanced_settings`` to the expected keys. 2. ``test_tools_filesystem.TestFeatureFlag::test_enabled_with_*`` broke because the master beta gate now forces every beta sub-flag False at runtime when ``ENABLE_BETA_FEATURES`` is unset. Set both env vars together in the enabling tests so they exercise the sub-flag bool parsing in isolation, and add an explicit test for the gated behavior so a future regression in ``_apply_feature_flag_overrides`` surfaces here too. 3. ``test_yaml_config_tool.enable_flag`` fixture sets ``ENABLE_YAML_CONFIG_EDITING`` but didn't set the master, so the cached settings landed with the sub-flag forced False — would have broken next-up after the filesystem tests. Set ``ENABLE_BETA_FEATURES`` alongside. * fix(policy): contains operator case-insensitive on list-membership branch Pre-fix: case "contains": if isinstance(val, str) and isinstance(pv, str): return pv.lower() in val.lower() # CI return isinstance(val, (list, tuple, set)) and pv in val # case-SENSITIVE The string-in-string branch was already case-insensitive (matching the ``_ci``-equivalent treatment that ``eq`` / ``in`` / ``not_in`` apply), but the list-membership branch fell through to Python's default ``in`` operator. A rule listing ``["light.kitchen"]`` would not fire on an LLM passing ``["Light.Kitchen"]`` — silent gate failure. Bring it in line with the other string-op branches via per-element ``_ci``, which passes non-string entries through unchanged so mixed-type collections still get natural equality semantics. Caught by Gemini Code Assist on #1431; addressing inline rather than opening a follow-up because the policy module is now in master (#1421) and any reviewer running the suite would see the case-sensitivity asymmetry in the existing TestCaseInsensitive class. * feat(settings-ui): addon-aware locked banner, beta-at-bottom, danger warning, dual save buttons, fork-dev stable copy (#1164) Five user-feedback fixes against the Server Settings UI: 1. Locked-banner copy adapts in addon mode. The standalone "Set via env var X — unset it to edit here." copy is misleading in HA addon mode where the operator has no env-var surface (start.py writes the env vars from /data/options.json; Supervisor writes the rest). Endpoints now return is_addon; the JS helper envLockedNoteHtml swaps in addon-aware copy that points users at the addon Configuration tab. Master beta gets an extra hint explaining the auto-enable rule. 2. Beta block rendered into a dedicated bottom-of-panel betaBody container instead of featuresBody. The dangerous block sits last so users see safer settings first; a "Beta features (dangerous)" header in warning color marks the boundary. 3. enable_beta_features help-text rewritten to lead with an explicit danger warning (permanent damage to HA, no warranty, take a backup, own risk). Mirrored as a blockquote at the top of the dev addon's beta-options section in DOCS.md so the addon UI also surfaces the risk. 4. Save button redesigned — primary-CTA styling (bigger, accent background, hover state), duplicated at the top of the panel so a user scrolling either end can hit save, and paired with a prominent two-step note explaining that Save → Restart are both required for changes to take effect. 5. New scripts/fork-dev/copy-stable.sh + restore-dev.sh let maintainers whose only HA test path is the fork-dev addon flip homeassistant-addon-dev/ between dev-flavor and a stable-mirroring "stable test" flavor. Round-trip clean — copy → restore restores the index exactly. JSDOM behavioural coverage added for each of (1)-(4); MIN_DOM updated with the new top-row + beta-section element ids. * fix(tests): JSDOM beta-block tests assert against production HTML / fixed regex (#1164) Three CI failures from the previous commit: - ``test_beta_section_header_present_with_danger_styling`` and ``test_two_step_save_note_present`` asserted on static ``panel-server`` markup that lives in the rendered HTML template, not in any JS-populated container. MIN_DOM doesn't replicate the full panel-server children (by design — it's a minimal handler stub), so the assertions never found their target strings. Switch both tests to assert directly against ``_SETTINGS_HTML``; the presence of the markup at the template level is the property we actually want to lock down. - ``test_beta_rows_render_into_betaBody`` used a regex ``<div id="betaBody">(.*?)</div>\s*<div`` whose ``</div>\s*<div`` boundary matched the very first nested ``</div><div`` *inside* the master row (after the ``.feature-name`` close tag), so ``bb_content`` only captured the header of the master row. Anchor the boundary on ``</div>\s*</body>`` instead — non-greedy capture between the ``betaBody`` open tag and the body close still gives the full container content. Added a fallback path that asserts on class markers + non-leakage to featuresBody if the regex still misses. * feat(settings): fix stuck-master bug, master in dev schema, cascade-clear, drop connection panel, sync backup_hint+verify_ssl (#1164) Six fixes against the Server Settings + addon Configuration surfaces: 1. ``maybe_auto_enable_beta_master`` now requires ``config.get(key) is True`` instead of ``key in config``. The bare presence check fired the moment HA Supervisor merged the dev addon's schema defaults into options.json, locking the master "on" with origin=env on every fresh dev install even when all 5 sub-flags were False. New unit tests cover the truthy / all-false / one-of-many / non-bool-truthy permutations so a future regression on the semantic fails loudly. 2. Master ``enable_beta_features`` moved into the dev addon Configuration tab (schema + options + translations + DOCS.md). Defaults ON in dev — beta tools are the channel's purpose, so a fresh install lights them up without the user round-tripping to the web UI. Stable's schema is unchanged; the standalone web UI master path remains the gate there. start.py writes ENABLE_BETA_FEATURES from options.json only when the key is present; ``get_feature_flag_origin`` now treats the master like the sub-flags (env-var presence in addon mode → origin=addon). ``maybe_auto_enable_beta_master`` is kept as a one-cycle legacy bridge for installs whose options.json pre-dates the master key. 3. Beta sub-flags also default ON in the dev addon options block. 4. Master-off cascade clear: flipping ``enable_beta_features=False`` in ``_save_feature_flags`` now also writes False for every truthy beta sub-flag in the same save. Without this, sub-flags stayed True in the override file and resumed the moment the master was flipped back on — UX bug the user reported as "having to turn off every toggle individually." JS mirrors the cascade so sub-rows visually de-toggle on master-off without a page reload. 5. "Connection (display only)" section removed from the Server Settings panel. The read-only HOMEASSISTANT_URL / TOKEN / SUPERVISOR_TOKEN fields just wasted space — operators already see them in addon logs and configuration. Registry entries kept in ADVANCED_SETTINGS_FIELDS so the API still returns them (env-pin debugging, future surfaces). ``verify_ssl`` moved from ``connection`` to ``operations`` so it still renders in the panel. 6. ``backup_hint`` and ``verify_ssl`` now sync between addon Configuration and the web UI like feature flags do. New ``ADDON_SYNCED_ADVANCED_FIELDS`` set drives the origin helper (returns ``'addon'`` in addon mode for these) and the save handler (routes their writes through Supervisor ``/addons/self/options`` instead of the override file). Cross-surface gate visibility note appended to every beta sub-flag description in ``translations/en.yaml`` (and a top-of-options note on the master) so addon Configuration users know the web UI master gates everything. Locked-banner addon-mode copy from the earlier commit was already in place; tests in this commit re-target fixtures from the removed connection section to ``search``. * fix(addon-dev): beta sub-flags default OFF — only the master defaults ON (#1164) Mis-read of the user's intent in the previous commit. The intended shape for the dev addon's fresh-install defaults is: enable_beta_features: true ← gate unlocked enable_yaml_config_editing: false ← user opts in enable_filesystem_tools: false ← user opts in enable_custom_component_integration: false ← user opts in enable_code_mode: false ← user opts in enable_lite_docstrings: false ← user opts in The previous commit defaulted every sub-flag to true alongside the master, which would have shipped every beta tool live on a fresh dev install — including filesystem writes and the YAML config editor. Each sub-flag mutates the user's HA system, so they remain opt-in even on the dev channel; the master being on just means the gate is open. * revert(scope): drop scripts/fork-dev/ — maintainer tooling, wrong repo (#1164) Pushed these in 69b7a235 as part of task #17. They're personal-fork test tooling for the fork-dev addon workflow — they have no business on master. Removing from the PR; they're still in this branch's git history if anyone needs to fish them back out. * fix(addon+ui): #1431 review pass — restore MCP_HOST, gate sub-flag env writes, sane save (#1164) Round-2 review pass addressed 14 verified findings: **Bugs**: - **MCP_HOST regression** restored. The PR's earlier merge of upstream/master dropped the `bind_host = os.getenv("MCP_HOST", "0.0.0.0")` block introduced by #1434/#1436. `mcp.run(host=...)` now goes through `bind_host` again. - **Beta sub-flag env vars** are now written only when the matching key is present in `/data/options.json`. start.py was writing ENABLE_YAML_CONFIG_EDITING=false (etc.) unconditionally on stable addon, marking those fields origin='addon' in the web UI; the user's save then POSTed to Supervisor which rejected because the keys are not in stable's schema. - **Env-pinned tool save 409**: `_save_tools` now accepts re-sends whose state matches the env-pinned value, rejecting only true mismatches. The JS `saveConfig` POSTs the entire `toolStates` map including env-pinned rows; without this fix every save with `DISABLED_TOOLS` / `PINNED_TOOLS` non-empty would 409. - **Cascade-clear** now reads the persisted override file directly via `_read_feature_flag_override_file()` instead of `get_global_settings()` (whose master gate had already forced sub-flags to False, hiding stale-true overrides). Also force-False sub-flags that are explicitly True in the same payload as master=false, so `{master:false, sub:true}` no longer lands an inconsistent persisted state. - **Master beta-gate check** is now applied uniformly (no more "skip in addon mode" carve-out). The skip existed because the legacy auto-enable wrote ENABLE_BETA_FEATURES from sub-flag presence; now start.py writes the master from its own options key, so the gate is sound to apply in both modes. - **Dev-upgrade silent-disable warning**: start.py logs when master=false but a sub-flag is true in options.json, so an operator who toggled the master off in Configuration after a pre-#1164 dev install sees why their previously-enabled beta tools went away. - **Mixed-batch advanced save** is now split client-side. The server-side guard that returned 500 stays as a defense, but the UI no longer triggers it — `saveAdvancedSettings` partitions `_advancedDirty` into addon-routed and file-routed batches. **Logging / defensive code**: - Supervisor failure in `_save_advanced_settings` now logs before returning, matching the sibling `_save_feature_flags` / `_save_backup_config` handlers. - The three `assert sup_err is not None` sites that would crash under `python -O` now explicitly return INTERNAL_ERROR (covers the addon-route paths in feature flags and advanced settings). - `_apply_advanced_overrides` narrows `except Exception` to `(ValueError, TypeError)`, matching the parallel `_apply_feature_flag_overrides` exception shape. - `maybe_auto_enable_beta_master` now logs which sub-flag(s) triggered the legacy bridge when it fires, with a removal- candidate note in the docstring. **Docs / UX**: - Docstring drift fixed in `get_feature_flag_origin`, `_get_advanced_settings`, `_save_advanced_settings`. - `envLockedNoteHtml` master copy rewritten — origin='env' on the master is now only the legacy-bridge path, not the default dev-addon path. - "Bottom" save button now actually at the bottom of `panel-server` (below the beta block and its code-mode sub-numerics). Second two-step save note duplicated near the bottom row so users editing dangerous beta toggles also see it. Findings deferred to follow-up (legit but bigger than this pass): - A.2 concurrent-save read-modify-write race (needs an `asyncio.Lock` around the override-file path; functionality is safe today because the runtime gate hides the persisted-state inconsistency). - A.4 master-flip via addon Configuration tab → no cascade (the runtime gate + new log warning cover the observable surface). - F.* missing tests for new behaviours — adding in a follow-up commit. * test(settings): cover #1431 review pass fixes (#1164) - ``test_env_pinned_noop_resend_does_not_409`` — JS saveConfig POSTs the entire toolStates map; env-pinned no-op resend must be accepted. - ``test_env_pinned_value_mismatch_still_409s`` — pin true flips still rejected. - ``test_save_features_cascade_clears_subflag_even_when_payload_says_true`` — in-payload {master:false, sub:true} → 409 instead of inconsistent persisted state. - ``test_save_features_cascade_reads_override_file_not_post_gate_settings`` — cascade reads the file directly so it catches stale-true sub-flag overrides hidden by the master gate on the resolved Settings. - ``test_stable_addon_does_not_declare_enable_beta_features`` and ``test_dev_addon_declares_enable_beta_features_master_in_schema`` — lock the schema asymmetry that makes the dev/stable channel distinction work. - ``test_dev_addon_defaults_every_beta_subflag_to_false`` — sub-flags remain opt-in even on dev. * fix(settings): serialise override-file RMW + cover addon-synced advanced save (#1164) A.2 (concurrent-save race): both ``_save_feature_flags`` and ``_save_advanced_settings`` touch the same ``feature_flags.json`` override file. Two near-simultaneous requests could interleave their read/merge/write and clobber each other's persisted state. The runtime master gate hid the inconsistency for beta sub-flags, but other field combinations (advanced + feature-flag in the same window) would have lost state silently. Wrap the RMW window in an ``asyncio.Lock`` (lazy-initialised under the live event loop so module import doesn't bind to no loop). Both handlers acquire the same lock, so saves serialise correctly. F.35 + F.40 (test coverage): - ``test_save_advanced_addon_synced_routes_through_supervisor`` — ``backup_hint`` / ``verify_ssl`` saves in addon mode call ``_supervisor_merge_and_post_options``, return ``mode='addon'``, do NOT write the override file. - ``test_save_advanced_addon_synced_supervisor_4xx_surfaces_validation_failed`` — Supervisor schema rejection surfaces as ``CONFIG_VALIDATION_FAILED`` with the Supervisor status code preserved, not a generic 502. - ``test_origin_for_addon_synced_field_is_addon_in_addon_mode`` — pins the origin matrix: ``backup_hint`` / ``verify_ssl`` come back ``origin='addon, editable'`` in addon mode; non-synced env-pinned fields stay ``origin='env, locked'``. * fix(settings): A.8 preserve sub-flag visual + cover remaining deferred test gaps (#1164) A.8 — JS master-off no longer flips sub-flag checkboxes visually. Previously the cascade-clear set ``_lastFeatureFlags[sub].value = false`` on master-off so the re-render painted sub-rows unchecked. That fought the user's mental model — they expressed intent on individual sub-flags, master-off shouldn't visually wipe that context. Now sub-rows stay checked + dimmed + disabled after the master flips off; the server-side cascade still clears the values on disk, so a refresh shows the cleared state, and a failed save leaves the visible checked state matching the actual on-disk state. F.37 — ``test_master_off_click_dims_subrow_live_without_clobbering_value`` dispatches a real change event on the master input and asserts the sub-row goes dimmed + disabled but keeps its checked attribute. F.38 — three cross-mode origin permutations for the master: - dev-addon env-set → 'addon' - standalone env-set → 'env' - addon mode + file override (no env) → 'file' F.42 — ``test_dual_save_buttons_mirror_disabled_and_status_state`` probes both ``advSaveStatus`` / ``advSaveStatusTop`` text and both buttons' disabled state via a hidden probe div, asserts the mirror holds at save completion. * fix(tests): probe JSDOM properties via probe div; assert mid-save mirror (#1164) Two test bugs in 25b45ab7's new assertions: 1. ``test_master_off_click_dims_subrow_live_without_clobbering_value`` asserted on the literal string ``"checked"`` in the serialised DOM. ``input.checked`` is a DOM property (not an HTML attribute), so JSDOM's serialiser doesn't emit it. The .checked state IS true at the property level — just invisible to the regex. Probe via a hidden ``__sub_state_probe`` div that reads the .checked / .disabled properties and writes them to data-* attributes. 2. ``test_dual_save_buttons_mirror_disabled_and_status_state`` probed AFTER the full save+reload chain, by which point loadAdvancedSettings() had blanked both status text els. Restructure the test to probe SYNCHRONOUSLY after ``click()``, while saveAdvancedSettings is mid-flight (status="Saving…", both buttons disabled). That's the actual mirror invariant we wanted to lock — the helpers ``_setAdvSaveDisabled(true)`` and ``_setAdvSaveStatus('Saving…')`` run synchronously before the first await. * feat(settings): drop sub-flag cascade-clear; restore advanced_debug_logging translation; clearer Save-button copy (#1164) Three user-reported fixes from stable-addon testing: 1. Stable add-on's ``translations/en.yaml`` was missing the ``advanced_debug_logging`` description — the schema declares the toggle in ``config.yaml:49`` but no translation ever shipped, so the addon Configuration UI showed an unlabelled checkbox. Add the missing entry (same wording as dev's translation). 2. Big "Save advanced settings" button used to say "Nothing to save." after the user toggled a feature flag (beta master, Tool Search, etc.). Feature-flag toggles auto-save on click via ``saveFeatureFlag`` — they never enter ``_advancedDirty``, so the advanced-save button sees nothing to do. When the restart banner is already showing (recent feature-flag save), surface that explicitly: "No advanced changes to save — your feature-flag toggles already saved on click. Click Restart above to apply them." Falls back to the original "Nothing to save." copy when there's no pending restart. 3. Drop the master-off cascade-clear behaviour entirely. The runtime master gate in ``_apply_feature_flag_overrides`` already forces every beta sub-flag to False whenever the master is off, so the tools stay disabled at runtime regardless of file state. Leaving the sub-flag values in the override file means toggling the master off → on restores the user's prior sub-flag selections automatically; the previous cascade-clear forced users to re-check each sub-flag after every master cycle, which was the wrong UX trade for an opt-in beta surface. The master-gate check is unchanged — it still rejects payloads that try to enable a sub-flag while the effective master is off, so the "sub true while master false in same payload" inconsistency still can't land. The cascade-clear was a separate (now-removed) defence. Tests updated: - ``test_save_features_master_off_preserves_subflag_values`` — was ``test_save_features_cascade_clears_subflags_when_master_off``; asserts the new "preserve" semantics. - ``test_save_features_master_on_restores_runtime_subflag_values`` — new round-trip test for master off → on restoring sub-flags. - ``test_save_features_payload_master_false_sub_true_rejected_by_gate`` — renamed; asserts gate rejection still covers the inconsistent payload. - ``test_save_features_cascade_reads_override_file_not_post_gate_settings`` — deleted (no cascade, no reason to test cascade's read path). - ``test_save_features_master_off_applied_dict_contains_only_master`` — new no-cascade pin; ``applied`` carries only the master flip. * fix(settings): #1431 review pass — address 13 verified findings (#1164) Round-3 review pass landed 13 verified findings; the cosmetic "persisted-value visible in UI after F5" item was explicitly skipped per user direction (UI shows post-gate value when master off; user accepts this since beta tools are actually disabled at runtime — the preserve-across-master-cycle UX is at the data layer, not the visual). Source fixes: - **Save button copy is now source-blind** — the previous "your feature-flag toggles already saved on click" claim only held when ``saveFeatureFlag`` raised ``restartNotice``; tool-config pin saves, backup-config saves, and cross-tab ``restart-required`` broadcasts also raise it. New copy: "a restart is pending. Click Restart above to apply your prior changes." (#2) - **Stale F.37 test docstring** describing the deleted server-side cascade rewritten. (#3) - **Beta-gate INFO log noise** — cascade-clear removal meant the gate could fire its "forcing %s=False" line every Settings rebuild, spamming addon logs once a user had truthy sub-flags persisted. Dedup via ``_BETA_GATE_LOGGED`` set per process, cleared on ``_reset_global_settings``. (#9) - **Lazy-lock docstring** updated to reflect Python 3.13 semantics (``asyncio.Lock()`` no longer takes a loop arg; the lazy pattern still serves test fixtures and single-loop deployment, with the invariant documented). (#10) - **Addon-mode carve-out comment** clarified to distinguish dev (master in schema) from stable (master web-UI-only). (#14) - **probe-div null branch** in F.37 now writes ``data-error`` so a failing test points at "selector missed" vs "value flipped" unambiguously. (#17) Tests added: - ``test_translations_cover_every_schema_key`` — parity check that every ``schema:`` key has a non-empty translation ``name`` and ``description``. Parameterised across stable + dev addons. Pins the class of silent gap that this PR's ``advanced_debug_logging`` fix addressed. (#4) - ``test_save_features_acquires_override_file_lock`` + ``test_save_advanced_acquires_override_file_lock`` — counting-lock wrapper asserts ``async with _get_override_file_lock()`` runs exactly once in each file-mode write path. Pin against a regression that silently bypasses concurrent-save serialisation. (#5) - ``test_dual_save_buttons_mirror_disabled_and_status_on_post_failure`` — exercises the 500-response branch of the dual-save mirror so a regression that broke ``_setAdvSaveStatus``/``_setAdvSaveDisabled`` for error paths only would still fail. (#6) - ``test_save_features_master_on_restores_subflag_values_in_addon_mode`` — addon-mode round-trip mirror of the existing standalone restore test; asserts the Supervisor merge-and-post call carries only the master flip-on and never zeroes out sub-flag values. (#7) - ``test_save_button_nothing_to_save_when_no_dirty_and_no_restart`` + ``test_save_button_restart_pending_hint_when_dirty_empty_but_restart_showing`` — both branches of the empty-dirty Save click are exercised; the restart-pending branch asserts the copy is source-blind. (#8) Deferred per user direction: - #1 (file-vs-Settings visual after F5): user accepts the current behavior (UI shows post-gate value; runtime tools actually disabled when master off; data-layer preserve still works end-to-end). - #11/#12/#13 (code-simplifier helper extractions): skipped as complicated to implement without behavior risk. - #15 (pre-#1164 users with already-cleared sub-flags): release-note concern, not a code change. - #16 (lock fragility under future thread-pool dispatch): speculative future-risk; not actionable today. * feat(settings): version footer + Patch76 review feedback Adds the running ha-mcp version to the settings UI footer (issue #1466). ``info.version`` flows from ``/api/settings/info`` → ``HA_MCP_BUILD_VERSION`` env var on addon builds (set by both stable and dev Dockerfiles) → package metadata fallback. Empty on older deployments without the field. Addresses Patch76's review (no blockers, all bundleable): - ``OverrideField`` folds the structurally-identical ``FeatureFlagField`` and ``BackupOverrideField`` into one NamedTuple; aliases preserve readable construction sites. ``AdvancedField`` keeps its own type since it c…
Summary
ENABLE_CODE_MODE=true)ENABLE_CODE_MODE_LIST_TOOLSsetting to also exposeListToolsfor full catalog discoveryfastmcpdependency to include[code-mode]extrafalse— no behavior change unless explicitly enabledChanges
pyproject.tomlfastmcp==3.1.0→fastmcp[code-mode]==3.1.0src/ha_mcp/config.pyENABLE_CODE_MODEandENABLE_CODE_MODE_LIST_TOOLSsettingssrc/ha_mcp/server.py_apply_code_mode()method, called after all tools are registered.env.exampleTest plan
ENABLE_CODE_MODE=trueand verify server starts with only meta-tools (search, get_schema, execute)falseENABLE_CODE_MODE_LIST_TOOLS=truealongside code mode and verify ListTools appearsfastmcp[code-mode]extra is not installed🤖 Generated with Claude Code