Performance: optimize DynamicEss schedule window construction by tmeinlschmidt · Pull Request #20 · victronenergy/dbus-systemcalc-py

tmeinlschmidt · 2026-01-28T22:21:41Z

Replace repeated string formatting with precomputed key lookups in the 5second timer hot path.

Add _SlotKeys namedtuple and _DESS_KEYS tuple for schedule settings keys, eliminating 384 str.format() calls per tick
Rewrite windows() to skip 7 remaining lookups for empty slots (start == 0) using early-exit
Merge max-schedule and current-window loops into single pass

Replace repeated string formatting with precomputed key lookups in the 5second timer hot path. - Add _SlotKeys namedtuple and _DESS_KEYS tuple for schedule settings keys, eliminating 384 str.format() calls per tick - Rewrite windows() to skip 7 remaining lookups for empty slots (start == 0) using early-exit - Merge max-schedule and current-window loops into single pass

izak · 2026-02-05T13:44:17Z

@realdognose, I need you to look at this one for me. Specifically check the early exit thing. I recall that I had to chance that, a long time ago, because the VRM guys wants to know the timestamp of the last available window, and it is designed so that the windows aren't necessarily in order. In any case, you are better suited to look at this now.

realdognose · 2026-02-05T18:28:53Z

@izak I had a look at the suggested changes:

Yes, currently windows() is a generator (which would make sence) but is bypassed by list() for the legacy way to determine the last windows timestamps. I already have a local branch that improves this, by using the generator as a generator and determining the last window slot different - but pending because of exactly the "order question". (For Node-Red mostly, VRM will push "in-order"))

The optimization about 384 format calls is no optimization (as long as the generator-issue exists). The change is less readable but currently would do 384 format calls :-)
(A full join of 48 slots and 8 keys will always be 384 different strings to produce ("format"), no matter how you do that)

So, until we have clarified the "in-order" question for Node-Red there's nothing to pull from this PR currently.

tmeinlschmidt · 2026-02-05T23:35:47Z

tried to mock up and do some measurement with code

import timeit
from collections import namedtuple
from datetime import datetime

NUM_SCHEDULES = 48
FIELDS = ('start', 'duration', 'soc', 'targetsoc', 'discharge', 'restrictions', 'strategy', 'flags')

# Simulate settings dict (mimics MockSettingsDevice __getitem__)
settings = {}
for i in range(NUM_SCHEDULES):
   for field in FIELDS:
       settings['dess_{}_{}'.format(field, i)] = 0
# Make 3 slots active
for i in range(3):
   settings['dess_start_{}'.format(i)] = 1700000000 + i * 3600

# --- OLD approach ---
def windows_old():
   starttimes = (settings['dess_start_{}'.format(i)] for i in range(NUM_SCHEDULES))
   durations = (settings['dess_duration_{}'.format(i)] for i in range(NUM_SCHEDULES))
   socs = (settings['dess_soc_{}'.format(i)] for i in range(NUM_SCHEDULES))
   targetsocs = (settings['dess_targetsoc_{}'.format(i)] for i in range(NUM_SCHEDULES))
   discharges = (settings['dess_discharge_{}'.format(i)] for i in range(NUM_SCHEDULES))
   restrictions = (settings['dess_restrictions_{}'.format(i)] for i in range(NUM_SCHEDULES))
   strategies = (settings['dess_strategy_{}'.format(i)] for i in range(NUM_SCHEDULES))
   wflags = (settings['dess_flags_{}'.format(i)] for i in range(NUM_SCHEDULES))
   result = []
   for start, duration, soc, targetsoc, discharge, restrict, strategy, flags in zip(
           starttimes, durations, socs, targetsocs, discharges, restrictions, strategies, wflags):
       if start > 0:
           result.append((datetime.fromtimestamp(start), duration, soc, targetsoc, discharge, restrict, strategy, flags))
   return result

# --- NEW approach ---
_SlotKeys = namedtuple('_SlotKeys', 'start duration soc targetsoc discharge restrictions strategy flags')
_DESS_KEYS = tuple(
   _SlotKeys(*('dess_{}_{}'.format(field, i) for field in _SlotKeys._fields))
   for i in range(NUM_SCHEDULES)
)

def windows_new():
   result = []
   for slot, keys in enumerate(_DESS_KEYS):
       start = settings[keys.start]
       if start > 0:
           result.append((
               datetime.fromtimestamp(start),
               settings[keys.duration], settings[keys.soc], settings[keys.targetsoc],
               settings[keys.discharge], settings[keys.restrictions],
               settings[keys.strategy], settings[keys.flags], slot))
   return result

# Verify correctness
old_result = windows_old()
new_result = windows_new()
assert len(old_result) == len(new_result) == 3, f"Expected 3 windows, got old={len(old_result)}, new={len(new_result)}"
print("Correctness check passed: both produce 3 windows\n")

# Benchmark
N = 50_000
print(f"Running {N} iterations each...\n")

t_old = timeit.timeit(windows_old, number=N)
t_new = timeit.timeit(windows_new, number=N)
print(f"Old (format keys each call): {t_old/N*1e6:.1f} us/call")
print(f"New (pre-computed keys):      {t_new/N*1e6:.1f} us/call")
print(f"Speedup: {t_old/t_new:.2f}x")
print(f"Saved per call: {(t_old-t_new)/N*1e6:.1f} us")
print(f"Saved per minute (12 ticks): {(t_old-t_new)/N*1e6*12:.0f} us")

# Also test worst case: 0 active windows (most common scenario)
print("\n--- Worst case: 0 active windows ---")
for i in range(3):
   settings['dess_start_{}'.format(i)] = 0

t_old_0 = timeit.timeit(windows_old, number=N)
t_new_0 = timeit.timeit(windows_new, number=N)
print(f"Old: {t_old_0/N*1e6:.1f} us/call")
print(f"New: {t_new_0/N*1e6:.1f} us/call")
print(f"Speedup: {t_old_0/t_new_0:.2f}x")

# Test with all 48 active
print("\n--- Best case for old code: all 48 active ---")
for i in range(NUM_SCHEDULES):
   settings['dess_start_{}'.format(i)] = 1700000000 + i * 3600

t_old_48 = timeit.timeit(windows_old, number=N)
t_new_48 = timeit.timeit(windows_new, number=N)
print(f"Old: {t_old_48/N*1e6:.1f} us/call")
print(f"New: {t_new_48/N*1e6:.1f} us/call")
print(f"Speedup: {t_old_48/t_new_48:.2f}x")

results (running on my m1)

Correctness check passed: both produce 3 windows

Running 50000 iterations each...

Old (format keys each call): 76.2 us/call
New (pre-computed keys):      4.1 us/call
Speedup: 18.76x
Saved per call: 72.1 us
Saved per minute (12 ticks): 865 us

--- Worst case: 0 active windows ---
Old: 74.6 us/call
New: 2.4 us/call
Speedup: 31.49x

--- Best case for old code: all 48 active ---
Old: 93.6 us/call
New: 31.2 us/call
Speedup: 3.00x

and tried on rpi4 (8g of memory), more significant improvement in terms of savings

Correctness check passed: both produce 3 windows

Running 50000 iterations each...

Old (format keys each call): 805.3 us/call
New (pre-computed keys):      55.5 us/call
Speedup: 14.51x
Saved per call: 749.8 us
Saved per minute (12 ticks): 8998 us

--- Worst case: 0 active windows ---
Old: 902.3 us/call
New: 29.6 us/call
Speedup: 30.51x

--- Best case for old code: all 48 active ---
Old: 1236.4 us/call
New: 377.9 us/call
Speedup: 3.27x

ad 384 strings.. yes, but in my changes these are calculated just once, not every 5 secs. But completely understand and agree that until window order question is not resolved, this makes no sense to merge or cherrypick (as the order may change things a lot)

and was thinking a bit about that order etc.. given we have windows already filtered out by start == 0, then we can probably do something like

  windows = sorted(self.windows(), key=lambda w: w.start)

  # Last scheduled — was a full loop, now just:
  if windows:
      start = windows[-1].start
      stop = windows[-1].stop

  # Current + next window — was 2 separate loops, now one:
  for i, w in enumerate(windows):
      if self.acquire_control() and now in w:
          current_window = w
          # Next window is just the next item (if adjacent in time)
          if i + 1 < len(windows):
              next_window_save_start = w.stop + timedelta(seconds=1)
              if next_window_save_start in windows[i + 1]:
                  next_window = windows[i + 1]
          break

then windows[-1] is the last one scheduled.

realdognose · 2026-02-06T08:09:14Z

Having a 14.5x improvement sounds awesome - but when you look at performance gains, you would need to look at the overall picture as well:

As you noted, we are talking us here. You calculated for example 749us as a Speedup: 14.51x - but that's a very isolated view on only this method. To really identify this change's impact, consider the following:

Control Loop is running every 5 seconds.
On my gx MK 1, that loop takes between 20 and 30 ms in total for regular operation.

INFO:dynamicess:Control loop took 0.021248 seconds.
INFO:dynamicess:Control loop took 0.025785 seconds.

So, out of 5000ms computation time the gx has available, the dess-control_loop is consuming 30ms. (0.6%, effective 0.3% cerbo load on a dual core device). Improving that by 1000us will bring that down to 29ms (0.58% /0.29%) So, that's a gain of 0.02% (0.01%) additional computing time beieng available for other things.

Appreciate your contribution and will keep the change in mind for whenever that part needs to be touched logically anyway - but it's just not significant enough to consider it an urgent action point.

And a thought from another view:
I'm even considering to not change the generator-issue. Why? What we have now is a constant runtime. It takes the same, no matter if slot 0 or slot 48 would be active. With a real generator, we would have a linear increasing runtime, based on the active slot. So, it'll obfuscate the "real" weight of code for regular operation and in case of issues (connectivity loss) the load would increase over time. Not significant at all as we noted, but growing behaviour like that is a quite common source of issues. 99% systems running "ideal" and fine, and on 1% you could see load / timing issues. Hard to trace down then and identify that this is caused by a piece of code running less efficient for certain environment conditions that are not obviously related.

ps.: Overall the best would be to put the active (and next) window in memory and re-evaluate with a changelistener or when the active window expires. That'll be a decreasing load when connectivity is lost, cause no change-listener would fire, just the 15 min window exhaust lookup - but after all still to insignificant to be priorized over other development.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: optimize DynamicEss schedule window construction#20

Performance: optimize DynamicEss schedule window construction#20
tmeinlschmidt wants to merge 1 commit intovictronenergy:masterfrom
tmeinlschmidt:fixes/performance_windows

tmeinlschmidt commented Jan 28, 2026

Uh oh!

izak commented Feb 5, 2026

Uh oh!

realdognose commented Feb 5, 2026 •

edited

Loading

Uh oh!

tmeinlschmidt commented Feb 5, 2026 •

edited

Loading

Uh oh!

realdognose commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tmeinlschmidt commented Jan 28, 2026

Uh oh!

izak commented Feb 5, 2026

Uh oh!

realdognose commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tmeinlschmidt commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

realdognose commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

realdognose commented Feb 5, 2026 •

edited

Loading

tmeinlschmidt commented Feb 5, 2026 •

edited

Loading

realdognose commented Feb 6, 2026 •

edited

Loading