Skip to content

Commit c51896e

Browse files
committed
fix: smart-strategy resilience for refresh-status 5xx (counters, cache, recovery)
When the Toyota gateway returns persistent HTTP 500 on POST /refresh-status (some Lexus / Aygo / Yaris vehicles per ha_toyota#291 + ha_toyota#293), pytoyoda's controller exhausts its 4-attempt retry sequence and raises ToyotaApiError. The previous implementation called vehicle.refresh_status() without a try/except, which meant the exception propagated out of _refresh_one_vehicle with three consequences: 1. on_post_layer1_failure() never ran (it sits inside the `return_code != "000000"` branch, which a raised POST never reaches), so consecutive_post_rejections never advanced and the _AUTO_DISABLE_REJECTION_THRESHOLD soft/hard-disable mechanism never fired. status_refresh_state stayed `active` forever. 2. _refresh_one_vehicle's post-decision bookkeeping was skipped: trips manager refresh, movement detection, diag state persistence, and the caller's `last_good_per_vin[vin] = vehicle_data` line. Phase 1 had already fetched fresh telemetry/location/etc., but that fresh data was never promoted to the cache layer; entities served from the prior cycle's cached VehicleData. Reported as parking location frozen at home, lock state stale, etc. 3. Once auto-disable did fire (via the existing returnCode-rejection branch on cars where that path could trip), the only documented recovery was the user toggling enable_status_refresh OFF then ON. No way to retry without disabling the feature first. This commit: - Wraps the POST in contextlib.suppress for the same exception set the Layer 2 poll loop already catches. Collapses the exception path and the non-"000000" returnCode path into a single Layer 1 failure branch. - Adds a bare GET fallback (`vehicle.update(only=["status"])`) on Layer 1 failure so /status entities still refresh this cycle, even before auto-disable kicks in. - Lets explicit service calls bypass BOTH HARD_DISABLED_AUTO and HARD_DISABLED_USER. Matches the HA convention that polling toggles stop the cadence but explicit invocations still go through; users can disable the automatic strategy and drive POSTs from their own automations (geofence, garage door, time-of-day). - After a successful POST clears auto_disabled_status_refresh, the strategy goes back to ACTIVE on the next cycle without manual toggling. Users can now recover from auto-disable by simply pressing the refresh button instead of toggling options OFF/ON. - Three new tests in test_refresh_strategy.py covering the service-call bypass behaviour for both AUTO and USER disable + blocking when no service call is pending. - Updates const.py docstring for CONF_ENABLE_STATUS_REFRESH and services.yaml description for refresh_vehicle_status to reflect the cadence-vs-capability distinction. All 31 existing tests still pass; ruff clean. Closes ha_toyota#293.
1 parent 601326b commit c51896e

5 files changed

Lines changed: 154 additions & 23 deletions

File tree

custom_components/toyota/__init__.py

Lines changed: 78 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -394,30 +394,56 @@ async def _execute_post_then_get(
394394
retries 429/5xx with exponential backoff, so this loop only iterates
395395
if the gateway returned 200 with a stale occurrence_date (legitimate
396396
"POST accepted but cache not yet warm").
397+
398+
On POST failure (exception OR non-"000000" returnCode): record a
399+
Layer 1 rejection, possibly auto-disable, then fall back to a bare
400+
GET so /status entities still refresh this cycle. See ha_toyota#293.
401+
On POST success: clear any prior auto-disable flag so a service-call
402+
retry (or a transient-5xx recovery) restores normal operation
403+
without requiring the user to toggle the option manually.
397404
"""
398405
opts = _strategy_options()
399-
post_response = await _call_tagged(
400-
"refresh_status", vin, vehicle.refresh_status()
401-
)
406+
# POST raised after pytoyoda's retries exhausted (persistent gateway
407+
# 5xx) → post_response stays None and falls through to the Layer 1
408+
# failure branch below, same family as a non-"000000" returnCode
409+
# ("gateway will not process this POST"). _call_tagged has already
410+
# logged the underlying error.
411+
post_response = None
412+
with contextlib.suppress(
413+
ToyotaApiError,
414+
httpx.ConnectTimeout,
415+
httpcore.ConnectTimeout,
416+
asyncioexceptions.TimeoutError,
417+
httpx.ReadTimeout,
418+
):
419+
post_response = await _call_tagged(
420+
"refresh_status", vin, vehicle.refresh_status()
421+
)
402422
state.last_post_attempt_at = dt_util.now()
403423

404424
# Layer 1: gateway-level acceptance. payload.return_code "000000" =
405-
# accepted; anything else = vehicle does not support refresh-status.
406-
# (pytoyoda exposes the field as snake_case via Pydantic Field alias.)
407-
payload = getattr(post_response, "payload", None)
425+
# accepted; anything else (or no response = exception path) =
426+
# vehicle does not support refresh-status this cycle.
427+
payload = getattr(post_response, "payload", None) if post_response else None
408428
return_code = getattr(payload, "return_code", None) if payload else None
409429

410430
if return_code != "000000":
411431
should_auto_disable = on_post_layer1_failure(state, opts)
412-
_LOGGER.warning(
413-
"Toyota refresh-status rejected for vin=...%s (returnCode=%s)",
414-
vin[-6:],
415-
return_code,
416-
)
417-
if should_auto_disable:
432+
if post_response is not None:
433+
# 200 OK with non-000000 returnCode (gateway-level rejection).
434+
_LOGGER.warning(
435+
"Toyota refresh-status rejected for vin=...%s (returnCode=%s)",
436+
vin[-6:],
437+
return_code,
438+
)
439+
if should_auto_disable and not entry.options.get(
440+
CONF_AUTO_DISABLED_STATUS_REFRESH, False
441+
):
418442
# Persist auto-disable to config_entry.options. Triggers a
419-
# listener-driven reload, which is fine - state survives via
420-
# diag_bucket.
443+
# listener-driven reload, which is fine - state survives
444+
# via diag_bucket. Guarded against re-entrance: a service-
445+
# call retry that still 500s would otherwise trip the
446+
# threshold every cycle and trigger a redundant reload.
421447
hass.config_entries.async_update_entry(
422448
entry,
423449
options={
@@ -426,13 +452,49 @@ async def _execute_post_then_get(
426452
},
427453
)
428454
_LOGGER.warning(
429-
"Toyota auto-disabled smart refresh for vin=...%s after "
430-
"%d consecutive Layer 1 rejections",
455+
"Toyota auto-disabled smart refresh for vin=...%s "
456+
"after %d consecutive Layer 1 rejections",
431457
vin[-6:],
432458
state.consecutive_post_rejections,
433459
)
460+
# Fall back to a bare GET so /status entities still refresh
461+
# this cycle (matches the HARD_DISABLED legacy path). Useful
462+
# for cycles before auto-disable kicks in, and for any vehicle
463+
# whose POST 500s but whose /status still serves stale-cache
464+
# data we can read. Suppression list matches the POST's so
465+
# transient connectivity issues during the fallback don't
466+
# abort _refresh_one_vehicle's bookkeeping either.
467+
with contextlib.suppress(
468+
ToyotaApiError,
469+
httpx.ConnectTimeout,
470+
httpcore.ConnectTimeout,
471+
asyncioexceptions.TimeoutError,
472+
httpx.ReadTimeout,
473+
):
474+
await _call_tagged(
475+
"status_after_post_fail",
476+
vin,
477+
vehicle.update(only=["status"]),
478+
)
434479
return
435480
on_post_layer1_success(state)
481+
# Auto-recovery from HARD_DISABLED_AUTO: a successful POST proves
482+
# the gateway can process this endpoint. Lift the flag so the
483+
# strategy goes back to ACTIVE on the next cycle. Triggered by
484+
# service-call bypass (the user explicitly retrying via the
485+
# refresh button) or by a transient 5xx clearing on its own.
486+
if entry.options.get(CONF_AUTO_DISABLED_STATUS_REFRESH, False):
487+
hass.config_entries.async_update_entry(
488+
entry,
489+
options={
490+
**entry.options,
491+
CONF_AUTO_DISABLED_STATUS_REFRESH: False,
492+
},
493+
)
494+
_LOGGER.info(
495+
"Toyota auto-disable cleared for vin=...%s after successful POST",
496+
vin[-6:],
497+
)
436498

437499
# Layer 2: poll for occurrence_date advancement.
438500
deadline = dt_util.now() + timedelta(seconds=timeout_s)

custom_components/toyota/const.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -29,11 +29,14 @@
2929
# Smart status refresh strategy. POSTs /v1/global/remote/refresh-status to
3030
# wake the car's modem before reading /status, mimicking the Toyota mobile
3131
# app's two-stage protocol. Reduces stuck-stale lock/door state and 429s.
32-
# See rate-limit-remediation-plan.md Addendum 4.
32+
# Off = stop the automatic cadence; explicit refresh_vehicle_status service
33+
# calls still go through (per HA polling-toggle convention). See
34+
# rate-limit-remediation-plan.md Addendum 4.
3335
CONF_ENABLE_STATUS_REFRESH = "enable_status_refresh"
3436
DEFAULT_ENABLE_STATUS_REFRESH = True
3537
# Set automatically when the gateway repeatedly rejects the POST (vehicle
36-
# does not support refresh-status). Cleared when user toggles
38+
# does not support refresh-status). Cleared by either: (a) a successful
39+
# service-call POST proving the gateway works, or (b) the user toggling
3740
# CONF_ENABLE_STATUS_REFRESH OFF then ON. Hidden in the UI.
3841
CONF_AUTO_DISABLED_STATUS_REFRESH = "auto_disabled_status_refresh"
3942
DEFAULT_AUTO_DISABLED_STATUS_REFRESH = False

custom_components/toyota/refresh_strategy.py

Lines changed: 22 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -191,15 +191,30 @@ class RefreshDecision:
191191
# ----------------------------------------------------------------------------
192192

193193

194-
def _hard_disable_decision(opts: StrategyOptions) -> RefreshDecision | None:
195-
"""Return a HARD_DISABLED decision if either disable flag is set, else None."""
196-
if not opts.enable_status_refresh:
194+
def _hard_disable_decision(
195+
opts: StrategyOptions,
196+
*,
197+
user_service_call_pending: bool = False,
198+
) -> RefreshDecision | None:
199+
"""Return a HARD_DISABLED decision if either disable flag is set, else None.
200+
201+
Service calls bypass BOTH disable forms. The convention everywhere in
202+
HA is "polling toggle stops automatic polling, manual service calls
203+
still work" - so enable_status_refresh:False means "stop the strategy's
204+
cadence" rather than "lock out POSTs entirely". Users who want a
205+
bespoke schedule (geofence arrival, garage-door close, etc.) disable
206+
the cadence and drive POSTs from their own automations against the
207+
refresh_vehicle_status service. After a successful service-call POST,
208+
auto_disabled_status_refresh is cleared by the integration so a future
209+
cadence re-enable doesn't land in the auto-disabled state.
210+
"""
211+
if not opts.enable_status_refresh and not user_service_call_pending:
197212
return RefreshDecision(
198213
action=RefreshAction.HARD_DISABLED,
199214
trigger=RefreshTrigger.NONE,
200215
refresh_state=RefreshState.HARD_DISABLED_USER,
201216
)
202-
if opts.auto_disabled_status_refresh:
217+
if opts.auto_disabled_status_refresh and not user_service_call_pending:
203218
return RefreshDecision(
204219
action=RefreshAction.HARD_DISABLED,
205220
trigger=RefreshTrigger.NONE,
@@ -250,7 +265,9 @@ def decide(snapshot: CycleSnapshot) -> RefreshDecision:
250265
state = snapshot.state
251266
now = snapshot.now
252267

253-
hard = _hard_disable_decision(opts)
268+
hard = _hard_disable_decision(
269+
opts, user_service_call_pending=snapshot.user_service_call_pending
270+
)
254271
if hard is not None:
255272
return hard
256273

custom_components/toyota/services.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ refresh_vehicle_status:
66
cellular airtime and 12V battery. Recommended for use after returning
77
home from a drive when you want the lock state to be visible quickly,
88
or wired to an automation that triggers on a specific event.
9+
Works regardless of the integration's automatic refresh setting:
10+
if you have disabled "Refresh vehicle status remotely" or the gateway
11+
has auto-disabled it, this service still fires the POST so you can
12+
drive a fully manual refresh schedule from your own automations.
913
fields:
1014
device_id:
1115
name: Vehicle

tests/test_refresh_strategy.py

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,51 @@ def test_auto_disabled_returns_hard_disabled_auto():
6767
assert d.refresh_state is RefreshState.HARD_DISABLED_AUTO
6868

6969

70+
def test_service_call_bypasses_hard_disabled_auto():
71+
"""Service call overrides auto-disable so the user can retry manually."""
72+
s = _snap(
73+
options=StrategyOptions(
74+
enable_status_refresh=True, auto_disabled_status_refresh=True
75+
),
76+
user_service_call_pending=True,
77+
)
78+
d = decide(s)
79+
assert d.action is RefreshAction.POST_THEN_GET
80+
assert d.trigger is RefreshTrigger.SERVICE_CALL
81+
82+
83+
def test_service_call_bypasses_hard_disabled_user():
84+
"""Service call also bypasses user-disable, matching HA convention.
85+
86+
`enable_status_refresh: False` stops the automatic cadence; explicit
87+
service-call invocations (e.g. a garage-door automation calling
88+
`refresh_vehicle_status`) still go through. Users who want full lockout
89+
simply do not invoke the service.
90+
"""
91+
s = _snap(
92+
options=StrategyOptions(
93+
enable_status_refresh=False, auto_disabled_status_refresh=False
94+
),
95+
user_service_call_pending=True,
96+
)
97+
d = decide(s)
98+
assert d.action is RefreshAction.POST_THEN_GET
99+
assert d.trigger is RefreshTrigger.SERVICE_CALL
100+
101+
102+
def test_user_disable_blocks_non_service_triggers():
103+
"""Without a service call, user-disable still blocks the strategy."""
104+
s = _snap(
105+
options=StrategyOptions(
106+
enable_status_refresh=False, auto_disabled_status_refresh=False
107+
),
108+
user_service_call_pending=False,
109+
)
110+
d = decide(s)
111+
assert d.action is RefreshAction.HARD_DISABLED
112+
assert d.refresh_state is RefreshState.HARD_DISABLED_USER
113+
114+
70115
# ---------------------------------------------------------------------------
71116
# Step 4: service-call wins over everything except hard-disable
72117
# ---------------------------------------------------------------------------

0 commit comments

Comments
 (0)