Replace pymodbus, add block reads, improve startup by sphings79 · Pull Request #229 · ViperRNMC/marstek_venus_modbus

sphings79 · 2026-05-04T19:28:03Z

Overview

This PR is a substantial rework of the integration's Modbus communication layer
and coordinator polling logic, motivated by real-world stability problems
observed on a Marstek Venus D connected via an TCP
It covers four areas:

pymodbus → tmodbus with batch block reads – replacing the original
one-register-at-a-time pymodbus loop with tmodbus and grouped FC03 requests.
Self-healing gap avoidance – automatically learning which register ranges
the device does not support and avoiding them in future requests without
removing any YAML-defined register from polling.
Double-scaling – fixing a scaling bug in sensor.py and number.py
that caused every non-unity-scale sensor to show values that were too small
by exactly one scale factor.
Startup reliability – correct ConfigEntryNotReady propagation so HA
retries automatically when the device is unreachable at startup, and a new
setup sequence that guarantees user-enabled non-default entities receive
values immediately.

1. `helpers/modbus_client.py` – pymodbus → tmodbus with batch reads

Original behaviour

The original file used pymodbus.client.tcp.AsyncModbusTcpClient with a
MarstekModbusClient wrapper. Every register was read individually via
async_read_register() inside a _request_lock, with up to 3 retries each.
For a typical poll with 22 default sensors this meant 22+ sequential TCP
round-trips.

What changed

The file is rewritten to use tmodbus (create_async_tcp_client). The core
addition is a batch_read() function that groups the requested addresses into
the fewest possible FC03 requests using _build_blocks():

MAX_GAP = 15 – consecutive addresses with a gap of ≤ 15 registers are
combined into a single block. Registers between the requested addresses that
happen to fall inside the block are read for free.
MAX_BLOCK_SIZE = 64 – blocks are split if they would exceed 64 registers.

Measured improvement on the Venus D:

Tick	Unique addresses	Requests before	Requests after
`high`	5	5 (one per register)	4
`medium`	7	7	5
`low`	6	6	3
`very_low`	111	111	14

High-tick wall time dropped from ~0.6–1.4 s to 0.45–0.55 s (~50 % faster).

Self-healing gap avoidance (`bad_gaps` / `good_gaps`)

When a block read fails (timeout or device error) the gaps between the
consecutive requested addresses inside that failed block are recorded in a
bad_gaps set. _build_blocks then avoids bridging those gaps in all future
calls, so unsupported register ranges are never queried again.

A good_gaps set records every gap that has been bridged successfully at
least once. Good gaps are permanently protected: a temporary TCP outage that
causes every block to fail simultaneously cannot misclassify a working gap as
bad, so the optimised block layout is preserved across transient outages.

A MarstekModbusClient compatibility wrapper is retained so config_flow.py
and other callers that expect the original interface continue to work without
modification.

2. `coordinator.py` – group-based tick polling replaces per-sensor loop

Original behaviour

The original coordinator looped over self._all_definitions in
_async_update_data() and called self.client.async_read_register() for every
sensor individually, with per-sensor interval tracking using
_last_attempt_times and per-register exponential backoff using
_register_failures. Connection suspension logic (_consecutive_failures,
_connection_suspended) attempted to pause polling after repeated failures.

What changed

Group-based polling. Registers are sorted into four groups at load time by
their scan_interval YAML field (high, medium, low, very_low). Each
group is polled as a single batch_read call. A tick counter determines which
groups are due on each coordinator refresh.

Selective polling with dynamic opt-in. Registers with
enabled_by_default: false are excluded from the groups at startup. After all
entity platforms are set up, async_register_enabled_entities() reads the HA
entity registry and adds every register whose entity has been manually enabled
by the user to the very_low group (~3-minute interval). This replaces the
original approach of skipping disabled entities inside the poll loop.

Group-by-group initial poll on tick 1. All four groups are polled
sequentially on the very first coordinator tick so every entity has a value
immediately. Crucially each group uses its own separate batch_read call so
that failures in very_low (which may contain unsupported user-enabled
registers) cannot poison the good_gaps of the vetted default registers in
high / medium / low.

One-shot very_low catch-up flag. HA's select platform uses
update_before_add=True, which can trigger tick 1 before
async_register_enabled_entities() has run. A _needs_very_low_poll flag is
set when new entries are registered late and consumed on the next tick to fire
an additional one-shot very_low poll.

Extended coordinator timeout (30 s → 120 s). The very_low group can
contain up to 82 user-enabled non-default registers. The per-register fallback
in batch_read retries each address individually at 5 s each; 16 cell-voltage
registers × 5 s = 80 s exceeds the old 30 s timeout, causing spurious
UpdateFailed that silenced even the fast default sensors.

bad_gaps / good_gaps wired into every batch_read call via
self._bad_gaps and self._good_gaps on the coordinator.

3. `init.py` – setup order and `ConfigEntryNotReady` propagation

Original behaviour

async_init() logged an error and returned False when the TCP connection
failed. The outer except Exception in async_setup_entry caught
ConfigEntryNotReady along with everything else, returning False. HA treats
False as a permanent failure and does not schedule a retry.

What changed

ConfigEntryNotReady is now re-raised so it propagates to HA's built-in
exponential-backoff retry mechanism (5 s → 10 s → 30 s → 60 s → …). A
disconnected LAN cable or a device rebooting no longer permanently disables the
integration.

Setup order. async_register_enabled_entities() is now called between
async_forward_entry_setups() and async_config_entry_first_refresh() so
user-enabled non-default entities are already in the polling groups when tick 1
fires wherever the platform setup timing allows.

4. `sensor.py` – fix double-scaling

Original behaviour

coordinator.data stores already-scaled values because
extract_typed_value() (called inside batch_read / coordinator) applies the
scale factor before storing. However, MarstekSensor.native_value applied
scale a second time, and MarstekCalculatedSensor._calculate set
dep_values[alias] = float(val) * scale on already-scaled data.

This caused every non-unity-scale sensor to show values that were too small by
exactly one scale factor:

Sensor	Raw register	Correct	Was showing
AC voltage	2390	239.0 V	23.9 V
AC frequency	500	50.0 Hz	5.0 Hz
Battery capacity	5120	5.12 kWh	0.005 kWh
Internal temperature	191	19.1 °C	1.91 °C
Stored energy	—	0.61 kWh	0.0 kWh
Battery cycles (calc)	—	11.5	115

What changed

MarstekSensor.native_value – removed scale * value; only precision
rounding and states mapping remain. The coordinator is the single source of
truth for scaled data.
MarstekCalculatedSensor._calculate – dep_values[alias] is now
float(val) (no additional scale multiplication).

5. `number.py` – fix double-scaling and optimistic update

Original behaviour

MarstekNumber.native_value returned raw_value * self._scale on
already-scaled coordinator data (same root cause as sensor.py).
async_set_native_value stored the raw register integer
(int(value / self._scale)) in coordinator.data for the optimistic update.

What changed

native_value now returns data.get(self._key) directly.
async_set_native_value stores the engineering-unit value (the float passed
in by HA) in coordinator.data for the optimistic update, so the entity
shows the correct number immediately after a write.

Files changed

File	Change type
`custom_components/marstek_modbus/helpers/modbus_client.py`	Full rewrite: pymodbus → tmodbus, batch reads, bad_gaps/good_gaps
`custom_components/marstek_modbus/coordinator.py`	Full rewrite: group-based tick polling, batch reads, startup fixes
`custom_components/marstek_modbus/__init__.py`	Setup order + ConfigEntryNotReady re-raise
`custom_components/marstek_modbus/sensor.py`	Double-scaling fix in native_value and _calculate
`custom_components/marstek_modbus/number.py`	Double-scaling fix in native_value and optimistic update

Testing

Hardware: Marstek Venus D, 2 battery packs
Connection: TCP
HA version: 2026.4.4
Coworker: Claude Code

All 22 default sensors show correct scaled values within ~5 s of HA restart.
User-enabled non-default entities (16 cell voltages, 4 MPPT channels, 6
schedule slots, WiFi/BT/cloud status, MOS temperatures, …) show values within
~10 s of startup.
After the first very_low tick, the bad gaps (35002→35010,
42000→42010, 42011→42020) are learned and logged at INFO level; no further
ModbusConnectionError / reconnect events occur for those ranges.
High-tick poll completes in 0.45–0.55 s (4 requests), down from the original
approach of 5+ sequential single-register reads.
Coordinator runs stably for 2+ hours (tick 277+ observed) without
UpdateFailed or entity freezes.
Disconnecting the LAN cable during normal operation: coordinator recovers
within one tick after reconnection; good gaps are preserved; no manual reload
needed.
Starting HA with the device offline: integration logs
ConfigEntryNotReady and retries automatically, becoming available as soon
as the device responds.
TCP outage during a very_low poll: good gaps protect previously-working
block layouts from being marked bad; block layout returns to optimised form
on the next successful poll.

Refactor setup entry process and improve error handling. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Refactor number entity setup and improve comments for clarity. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Refactor Marstek sensor classes to ensure scaling is handled correctly by the coordinator. Adjusted methods to prevent double scaling and improved handling of sensor attributes and calculations. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Refactor Modbus client to use tmodbus, adding async connection and read/write methods. Maintain backward compatibility with existing read helpers. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Nerdiyde · 2026-05-13T20:44:54Z

Thanks for your work. :)
Did you test this on your system? Just asking because I got an requirements issue regarding tmodbus. I think you did not add tmdobus to the requirements list?

"requirements": ["tmodbus", "pymodbus>=3.9.2"],

sphings79 · 2026-05-13T20:54:18Z

You are right, forgot to update that.
Tried that local :) but you only need
"requirements": ["tmodbus"],

I think i forgot that cause i got an fw.bin and trying to decompile that since a week :)

Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Nerdiyde · 2026-05-13T21:11:54Z

I did some more quick testing regarding this.
Unfortunately on my setup (Venus 3.0 connected via LAN-->AP-->Wifi-->Router) the integration is not more stable than the current beta of the original implementation.
Also some values where missing for me. It was not possible to set the charge, max-charge and max-discharge power.
Sorry for the bad news but I hope this helps. :)

sphings79 · 2026-05-13T21:35:31Z

I did some more quick testing regarding this. Unfortunately on my setup (Venus 3.0 connected via LAN-->AP-->Wifi-->Router) the integration is not more stable than the current beta of the original implementation. Also some values where missing for me. It was not possible to set the charge, max-charge and max-discharge power. Sorry for the bad news but I hope this helps. :)

Do you have double checked, they were enabled?

sphings79 added 5 commits May 4, 2026 21:18

Enhance async_setup_entry with detailed comments

910e349

Refactor setup entry process and improve error handling. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Refactor MarstekNumber entity initialization

82d8eb6

Refactor number entity setup and improve comments for clarity. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Refactor Modbus client to use tmodbus library

cb5dc39

Refactor Modbus client to use tmodbus, adding async connection and read/write methods. Maintain backward compatibility with existing read helpers. Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Update coordinator.py

3f022dc

Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Update requirements in manifest.json

48099bf

Signed-off-by: sphings79 <43515272+sphings79@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace pymodbus, add block reads, improve startup#229

Replace pymodbus, add block reads, improve startup#229
sphings79 wants to merge 6 commits into
ViperRNMC:mainfrom
sphings79:dev_tmodbus_blockread

sphings79 commented May 4, 2026

Uh oh!

Nerdiyde commented May 13, 2026

Uh oh!

sphings79 commented May 13, 2026

Uh oh!

Nerdiyde commented May 13, 2026

Uh oh!

sphings79 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sphings79 commented May 4, 2026

Overview

1. helpers/modbus_client.py – pymodbus → tmodbus with batch reads

Original behaviour

What changed

Self-healing gap avoidance (bad_gaps / good_gaps)

2. coordinator.py – group-based tick polling replaces per-sensor loop

Original behaviour

What changed

3. __init__.py – setup order and ConfigEntryNotReady propagation

Original behaviour

What changed

4. sensor.py – fix double-scaling

Original behaviour

What changed

5. number.py – fix double-scaling and optimistic update

Original behaviour

What changed

Files changed

Testing

Uh oh!

Nerdiyde commented May 13, 2026

Uh oh!

sphings79 commented May 13, 2026

Uh oh!

Nerdiyde commented May 13, 2026

Uh oh!

sphings79 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `helpers/modbus_client.py` – pymodbus → tmodbus with batch reads

Self-healing gap avoidance (`bad_gaps` / `good_gaps`)

2. `coordinator.py` – group-based tick polling replaces per-sensor loop

3. `init.py` – setup order and `ConfigEntryNotReady` propagation

4. `sensor.py` – fix double-scaling

5. `number.py` – fix double-scaling and optimistic update