Skip to content

1p/3p switch can deadlock in state 5: the state-5 recovery token is consumed without any currtime being sent (stale enableUser/currentUser race) #591

Description

@SmartNightly

Related to #587 and #588 (same state-based 1p/3p path).

Environment

Summary

After a clean 1p/3p phase switch, the station can stay stuck in state 5 (interrupted) indefinitely even though the PV surplus is more than sufficient to charge. Observed once for 91 minutes until a manual re-plug. During the whole time the adapter never sends currtime > 0 again. The cause is a race that silently consumes the existing state-5 recovery token (startWithState5Attempted) before any restart command actually reaches the station.

Root cause

The state-5 recovery in isNoChargingDueToInteruptedStateOfWallbox() allows one restart attempt out of state 5 and blocks from the second on. That one attempt is wasted by a stale-state race:

  1. Surplus drops below the 3p minimum while charging 3p → adapter stops for the phase switch with currtime 0 1 → KEBA enters state 5 and emits Enable sys: 0 / Max curr: 0 as individual field updates.
  2. The KEBA does not push Enable user / Curr user at this point — those only arrive with the next full report 2. So the states behind enableUser (stays true) and currentUser (stays 6000) are stale for a while.
  3. The phase switch completes cleanly and the adapter calls regulateWallbox(6000):
    • oldValue is computed from the stale enableUser==trueoldValue = currentUser = 6000.
    • isNoChargingDueToInteruptedStateOfWallbox(6000) sees state 5, sets startWithState5Attempted = true (the one allowed attempt) and returns false.
    • But the send guard if (milliAmpere != oldValue) is now 6000 != 6000false → no UDP datagram, no log line. The recovery token is spent, yet the station never received anything.
  4. ~44 s later the next full report 2 arrives (Enable user: 0) and corrects enableUser to false — too late.
  5. From the next tick on, isNoChargingDueToInteruptedStateOfWallbox returns true (token already consumed) → logs No charging due to interupted charging station, forces milliAmpere = 0. State never leaves 5, so the flag is never reset → permanent deadlock until re-plug.

Log evidence

Real log excerpts from a single deadlock event (IP and serial redacted; debug level). The full report 2 payloads are abbreviated to the fields that matter for this race.

# (1) report 2 BEFORE stop — wallbox is charging, "Enable user: 1, Curr user: 6000"
2026-06-14 16:00:46.003  debug: kecontact.0 UDP datagram from <wallbox>:7090: {
  "ID": "2", "State": 3, "Enable sys": 1, "Enable user": 1,
  "Max curr": 6000, "Curr user": 6000, "Curr timer": 6000, ... }

# (2) Adapter stops for the phase switch
2026-06-14 16:00:46.178  info:  kecontact.0 stop charging for switch of phases ...
2026-06-14 16:00:46.178  info:  kecontact.0 stop charging
2026-06-14 16:00:46.241  debug: kecontact.0 Sent "currtime 0 1" to <wallbox>:7090

# KEBA reacts with individual-field updates only (no full report 2 yet)
2026-06-14 16:00:48.074  debug: kecontact.0 UDP datagram: {"Enable sys": 0}
2026-06-14 16:00:48.074  debug: kecontact.0 UDP datagram: {"State": 5}
2026-06-14 16:00:48.267  debug: kecontact.0 UDP datagram: {"Max curr": 0}

# (3) Phase switch completes cleanly, surplus is back, adapter computes 6000 mA
#     -- but NO "Sent currtime" follows; the recovery token is consumed silently.
2026-06-14 16:01:16.333  info:  kecontact.0 switch 1p/3p successfully completed.
2026-06-14 16:01:16.333  debug: kecontact.0 new current due to 1p charging is 5000
2026-06-14 16:01:16.333  debug: kecontact.0 wallbox set to charging maximum of 6000 mA
# <-- no "Sent currtime ...", no "(re)start charging ...", no "No charging due to interupted" here

# (4) First full report 2 AFTER the stop — ~44 s later — corrects enableUser to false
2026-06-14 16:01:30.775  debug: kecontact.0 UDP datagram from <wallbox>:7090: {
  "ID": "2", "State": 5, "Enable sys": 0, "Enable user": 0,
  "Max curr": 0, "Curr user": 6000, "Curr timer": 0, ... }

# (5) From the next tick onward: token already consumed → permanent block, no UDP send
2026-06-14 16:02:01.041  debug: kecontact.0 wallbox set to charging maximum of 6000 mA
2026-06-14 16:02:01.041  debug: kecontact.0 No charging due to interupted charging station
# ... this repeats every ~30 s for the next 91 minutes, no "Sent currtime" in between ...

# (6) Recovery only after manual re-plug at 17:32
2026-06-14 17:33:01.351  debug: kecontact.0 Sent "currtime 7300 1" to <wallbox>:7090

kecontact.0.state is constant 5 across the whole window; statistics.surplus is 6000–7078 W throughout, so the adapter wanted to charge the entire time. No currtime > 0 is sent between the stop and the post-re-plug restart.

Code path (main.js, line numbers approximate)

  • regulateWallbox() ~1404: oldValue from enableUser/state==3 + currentUser; send only if milliAmpere != oldValue; UDP send ~1434.
  • isNoChargingDueToInteruptedStateOfWallbox() ~2109: state-5 recovery, token set ~2118.

Suggested fix direction

Two independent angles; the first addresses the actual cause, the second hardens the token:

  • oldValue against stale states (root cause): in the state-5 recovery path, don't derive oldValue from possibly-stale enableUser/currentUser. E.g. force oldValue = 0 when state == 5, or bypass the milliAmpere != oldValue guard for the recovery send, so the one allowed currtime actually goes out and resolves state 5 instead of being suppressed by 6000 == 6000.
  • Token accounting (hardening): mark startWithState5Attempted as consumed only after a currtime > 0 was actually sent (set it after sendUdpDatagram in regulateWallbox, not inside isNoChargingDueToInteruptedStateOfWallbox).

(Flagging the direction rather than proposing a patch — the oldValue calculation also drives normal regulation, so the side effects are best judged by the maintainer.)

Reproduction

  • State-based contactor phase switch (not X2), charging 3p.
  • Surplus drops below the 3p minimum → adapter stops with currtime 0 for the switch.
  • The switch completes cleanly (no rapid Shelly bounce), and surplus returns to a chargeable level within the ~44 s before the next full report 2 corrects the stale states.
  • Result: adapter never sends currtime > 0 again → permanent state 5 until re-plug. With fast bounces the problem does not appear (state 5 never stays long enough).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions