Replace pymodbus, add block reads, improve startup#229
Conversation
Refactor setup entry process and improve error handling. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>
Refactor number entity setup and improve comments for clarity. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>
Refactor Marstek sensor classes to ensure scaling is handled correctly by the coordinator. Adjusted methods to prevent double scaling and improved handling of sensor attributes and calculations. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>
Refactor Modbus client to use tmodbus, adding async connection and read/write methods. Maintain backward compatibility with existing read helpers. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>
Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>
|
Thanks for your work. :)
|
|
You are right, forgot to update that. I think i forgot that cause i got an fw.bin and trying to decompile that since a week :) |
Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>
|
I did some more quick testing regarding this. |
Do you have double checked, they were enabled? |
Overview
This PR is a substantial rework of the integration's Modbus communication layer
and coordinator polling logic, motivated by real-world stability problems
observed on a Marstek Venus D connected via an TCP
It covers four areas:
pymodbus→tmodbuswith batch block reads – replacing the originalone-register-at-a-time pymodbus loop with tmodbus and grouped FC03 requests.
the device does not support and avoiding them in future requests without
removing any YAML-defined register from polling.
sensor.pyandnumber.pythat caused every non-unity-scale sensor to show values that were too small
by exactly one scale factor.
ConfigEntryNotReadypropagation so HAretries automatically when the device is unreachable at startup, and a new
setup sequence that guarantees user-enabled non-default entities receive
values immediately.
1.
helpers/modbus_client.py– pymodbus → tmodbus with batch readsOriginal behaviour
The original file used
pymodbus.client.tcp.AsyncModbusTcpClientwith aMarstekModbusClientwrapper. Every register was read individually viaasync_read_register()inside a_request_lock, with up to 3 retries each.For a typical poll with 22 default sensors this meant 22+ sequential TCP
round-trips.
What changed
The file is rewritten to use
tmodbus(create_async_tcp_client). The coreaddition is a
batch_read()function that groups the requested addresses intothe fewest possible FC03 requests using
_build_blocks():MAX_GAP = 15– consecutive addresses with a gap of ≤ 15 registers arecombined into a single block. Registers between the requested addresses that
happen to fall inside the block are read for free.
MAX_BLOCK_SIZE = 64– blocks are split if they would exceed 64 registers.Measured improvement on the Venus D:
highmediumlowvery_lowHigh-tick wall time dropped from ~0.6–1.4 s to 0.45–0.55 s (~50 % faster).
Self-healing gap avoidance (
bad_gaps/good_gaps)When a block read fails (timeout or device error) the gaps between the
consecutive requested addresses inside that failed block are recorded in a
bad_gapsset._build_blocksthen avoids bridging those gaps in all futurecalls, so unsupported register ranges are never queried again.
A
good_gapsset records every gap that has been bridged successfully atleast once. Good gaps are permanently protected: a temporary TCP outage that
causes every block to fail simultaneously cannot misclassify a working gap as
bad, so the optimised block layout is preserved across transient outages.
A
MarstekModbusClientcompatibility wrapper is retained soconfig_flow.pyand other callers that expect the original interface continue to work without
modification.
2.
coordinator.py– group-based tick polling replaces per-sensor loopOriginal behaviour
The original coordinator looped over
self._all_definitionsin_async_update_data()and calledself.client.async_read_register()for everysensor individually, with per-sensor interval tracking using
_last_attempt_timesand per-register exponential backoff using_register_failures. Connection suspension logic (_consecutive_failures,_connection_suspended) attempted to pause polling after repeated failures.What changed
Group-based polling. Registers are sorted into four groups at load time by
their
scan_intervalYAML field (high,medium,low,very_low). Eachgroup is polled as a single
batch_readcall. A tick counter determines whichgroups are due on each coordinator refresh.
Selective polling with dynamic opt-in. Registers with
enabled_by_default: falseare excluded from the groups at startup. After allentity platforms are set up,
async_register_enabled_entities()reads the HAentity registry and adds every register whose entity has been manually enabled
by the user to the
very_lowgroup (~3-minute interval). This replaces theoriginal approach of skipping disabled entities inside the poll loop.
Group-by-group initial poll on tick 1. All four groups are polled
sequentially on the very first coordinator tick so every entity has a value
immediately. Crucially each group uses its own separate
batch_readcall sothat failures in
very_low(which may contain unsupported user-enabledregisters) cannot poison the
good_gapsof the vetted default registers inhigh/medium/low.One-shot
very_lowcatch-up flag. HA'sselectplatform usesupdate_before_add=True, which can trigger tick 1 beforeasync_register_enabled_entities()has run. A_needs_very_low_pollflag isset when new entries are registered late and consumed on the next tick to fire
an additional one-shot
very_lowpoll.Extended coordinator timeout (30 s → 120 s). The
very_lowgroup cancontain up to 82 user-enabled non-default registers. The per-register fallback
in
batch_readretries each address individually at 5 s each; 16 cell-voltageregisters × 5 s = 80 s exceeds the old 30 s timeout, causing spurious
UpdateFailedthat silenced even the fast default sensors.bad_gaps/good_gapswired into everybatch_readcall viaself._bad_gapsandself._good_gapson the coordinator.3.
__init__.py– setup order andConfigEntryNotReadypropagationOriginal behaviour
async_init()logged an error and returnedFalsewhen the TCP connectionfailed. The outer
except Exceptioninasync_setup_entrycaughtConfigEntryNotReadyalong with everything else, returningFalse. HA treatsFalseas a permanent failure and does not schedule a retry.What changed
ConfigEntryNotReadyis now re-raised so it propagates to HA's built-inexponential-backoff retry mechanism (5 s → 10 s → 30 s → 60 s → …). A
disconnected LAN cable or a device rebooting no longer permanently disables the
integration.
Setup order.
async_register_enabled_entities()is now called betweenasync_forward_entry_setups()andasync_config_entry_first_refresh()souser-enabled non-default entities are already in the polling groups when tick 1
fires wherever the platform setup timing allows.
4.
sensor.py– fix double-scalingOriginal behaviour
coordinator.datastores already-scaled values becauseextract_typed_value()(called insidebatch_read/ coordinator) applies thescalefactor before storing. However,MarstekSensor.native_valueappliedscalea second time, andMarstekCalculatedSensor._calculatesetdep_values[alias] = float(val) * scaleon already-scaled data.This caused every non-unity-scale sensor to show values that were too small by
exactly one scale factor:
What changed
MarstekSensor.native_value– removedscale * value; only precisionrounding and
statesmapping remain. The coordinator is the single source oftruth for scaled data.
MarstekCalculatedSensor._calculate–dep_values[alias]is nowfloat(val)(no additional scale multiplication).5.
number.py– fix double-scaling and optimistic updateOriginal behaviour
MarstekNumber.native_valuereturnedraw_value * self._scaleonalready-scaled coordinator data (same root cause as
sensor.py).async_set_native_valuestored the raw register integer(
int(value / self._scale)) incoordinator.datafor the optimistic update.What changed
native_valuenow returnsdata.get(self._key)directly.async_set_native_valuestores the engineering-unit value (the float passedin by HA) in
coordinator.datafor the optimistic update, so the entityshows the correct number immediately after a write.
Files changed
custom_components/marstek_modbus/helpers/modbus_client.pycustom_components/marstek_modbus/coordinator.pycustom_components/marstek_modbus/__init__.pycustom_components/marstek_modbus/sensor.pycustom_components/marstek_modbus/number.pyTesting
Hardware: Marstek Venus D, 2 battery packs
Connection: TCP
HA version: 2026.4.4
Coworker: Claude Code
schedule slots, WiFi/BT/cloud status, MOS temperatures, …) show values within
~10 s of startup.
very_lowtick, the bad gaps (35002→35010,42000→42010,42011→42020) are learned and logged at INFO level; no furtherModbusConnectionError/ reconnect events occur for those ranges.approach of 5+ sequential single-register reads.
UpdateFailedor entity freezes.within one tick after reconnection; good gaps are preserved; no manual reload
needed.
ConfigEntryNotReadyand retries automatically, becoming available as soonas the device responds.
very_lowpoll: good gaps protect previously-workingblock layouts from being marked bad; block layout returns to optimised form
on the next successful poll.