Skip to content

Performance: optimize DynamicEss schedule window construction#20

Open
tmeinlschmidt wants to merge 1 commit intovictronenergy:masterfrom
tmeinlschmidt:fixes/performance_windows
Open

Performance: optimize DynamicEss schedule window construction#20
tmeinlschmidt wants to merge 1 commit intovictronenergy:masterfrom
tmeinlschmidt:fixes/performance_windows

Conversation

@tmeinlschmidt
Copy link
Contributor

Replace repeated string formatting with precomputed key lookups in the 5second timer hot path.

  • Add _SlotKeys namedtuple and _DESS_KEYS tuple for schedule settings keys, eliminating 384 str.format() calls per tick
  • Rewrite windows() to skip 7 remaining lookups for empty slots (start == 0) using early-exit
  • Merge max-schedule and current-window loops into single pass

Replace repeated string formatting with precomputed key lookups
in the 5second timer hot path.

- Add _SlotKeys namedtuple and _DESS_KEYS tuple for schedule
settings keys, eliminating 384 str.format() calls per tick
- Rewrite windows() to skip 7 remaining lookups for empty slots
(start == 0) using early-exit
- Merge max-schedule and current-window loops into single pass
@izak
Copy link
Collaborator

izak commented Feb 5, 2026

@realdognose, I need you to look at this one for me. Specifically check the early exit thing. I recall that I had to chance that, a long time ago, because the VRM guys wants to know the timestamp of the last available window, and it is designed so that the windows aren't necessarily in order. In any case, you are better suited to look at this now.

@realdognose
Copy link
Collaborator

realdognose commented Feb 5, 2026

@izak I had a look at the suggested changes:

Yes, currently windows() is a generator (which would make sence) but is bypassed by list() for the legacy way to determine the last windows timestamps. I already have a local branch that improves this, by using the generator as a generator and determining the last window slot different - but pending because of exactly the "order question". (For Node-Red mostly, VRM will push "in-order"))

The optimization about 384 format calls is no optimization (as long as the generator-issue exists). The change is less readable but currently would do 384 format calls :-)
(A full join of 48 slots and 8 keys will always be 384 different strings to produce ("format"), no matter how you do that)

So, until we have clarified the "in-order" question for Node-Red there's nothing to pull from this PR currently.

@tmeinlschmidt
Copy link
Contributor Author

tmeinlschmidt commented Feb 5, 2026

tried to mock up and do some measurement with code

import timeit
from collections import namedtuple
from datetime import datetime

NUM_SCHEDULES = 48
FIELDS = ('start', 'duration', 'soc', 'targetsoc', 'discharge', 'restrictions', 'strategy', 'flags')

# Simulate settings dict (mimics MockSettingsDevice __getitem__)
settings = {}
for i in range(NUM_SCHEDULES):
   for field in FIELDS:
       settings['dess_{}_{}'.format(field, i)] = 0
# Make 3 slots active
for i in range(3):
   settings['dess_start_{}'.format(i)] = 1700000000 + i * 3600

# --- OLD approach ---
def windows_old():
   starttimes = (settings['dess_start_{}'.format(i)] for i in range(NUM_SCHEDULES))
   durations = (settings['dess_duration_{}'.format(i)] for i in range(NUM_SCHEDULES))
   socs = (settings['dess_soc_{}'.format(i)] for i in range(NUM_SCHEDULES))
   targetsocs = (settings['dess_targetsoc_{}'.format(i)] for i in range(NUM_SCHEDULES))
   discharges = (settings['dess_discharge_{}'.format(i)] for i in range(NUM_SCHEDULES))
   restrictions = (settings['dess_restrictions_{}'.format(i)] for i in range(NUM_SCHEDULES))
   strategies = (settings['dess_strategy_{}'.format(i)] for i in range(NUM_SCHEDULES))
   wflags = (settings['dess_flags_{}'.format(i)] for i in range(NUM_SCHEDULES))
   result = []
   for start, duration, soc, targetsoc, discharge, restrict, strategy, flags in zip(
           starttimes, durations, socs, targetsocs, discharges, restrictions, strategies, wflags):
       if start > 0:
           result.append((datetime.fromtimestamp(start), duration, soc, targetsoc, discharge, restrict, strategy, flags))
   return result

# --- NEW approach ---
_SlotKeys = namedtuple('_SlotKeys', 'start duration soc targetsoc discharge restrictions strategy flags')
_DESS_KEYS = tuple(
   _SlotKeys(*('dess_{}_{}'.format(field, i) for field in _SlotKeys._fields))
   for i in range(NUM_SCHEDULES)
)

def windows_new():
   result = []
   for slot, keys in enumerate(_DESS_KEYS):
       start = settings[keys.start]
       if start > 0:
           result.append((
               datetime.fromtimestamp(start),
               settings[keys.duration], settings[keys.soc], settings[keys.targetsoc],
               settings[keys.discharge], settings[keys.restrictions],
               settings[keys.strategy], settings[keys.flags], slot))
   return result

# Verify correctness
old_result = windows_old()
new_result = windows_new()
assert len(old_result) == len(new_result) == 3, f"Expected 3 windows, got old={len(old_result)}, new={len(new_result)}"
print("Correctness check passed: both produce 3 windows\n")

# Benchmark
N = 50_000
print(f"Running {N} iterations each...\n")

t_old = timeit.timeit(windows_old, number=N)
t_new = timeit.timeit(windows_new, number=N)
print(f"Old (format keys each call): {t_old/N*1e6:.1f} us/call")
print(f"New (pre-computed keys):      {t_new/N*1e6:.1f} us/call")
print(f"Speedup: {t_old/t_new:.2f}x")
print(f"Saved per call: {(t_old-t_new)/N*1e6:.1f} us")
print(f"Saved per minute (12 ticks): {(t_old-t_new)/N*1e6*12:.0f} us")

# Also test worst case: 0 active windows (most common scenario)
print("\n--- Worst case: 0 active windows ---")
for i in range(3):
   settings['dess_start_{}'.format(i)] = 0

t_old_0 = timeit.timeit(windows_old, number=N)
t_new_0 = timeit.timeit(windows_new, number=N)
print(f"Old: {t_old_0/N*1e6:.1f} us/call")
print(f"New: {t_new_0/N*1e6:.1f} us/call")
print(f"Speedup: {t_old_0/t_new_0:.2f}x")

# Test with all 48 active
print("\n--- Best case for old code: all 48 active ---")
for i in range(NUM_SCHEDULES):
   settings['dess_start_{}'.format(i)] = 1700000000 + i * 3600

t_old_48 = timeit.timeit(windows_old, number=N)
t_new_48 = timeit.timeit(windows_new, number=N)
print(f"Old: {t_old_48/N*1e6:.1f} us/call")
print(f"New: {t_new_48/N*1e6:.1f} us/call")
print(f"Speedup: {t_old_48/t_new_48:.2f}x")

results (running on my m1)

Correctness check passed: both produce 3 windows

Running 50000 iterations each...

Old (format keys each call): 76.2 us/call
New (pre-computed keys):      4.1 us/call
Speedup: 18.76x
Saved per call: 72.1 us
Saved per minute (12 ticks): 865 us

--- Worst case: 0 active windows ---
Old: 74.6 us/call
New: 2.4 us/call
Speedup: 31.49x

--- Best case for old code: all 48 active ---
Old: 93.6 us/call
New: 31.2 us/call
Speedup: 3.00x

and tried on rpi4 (8g of memory), more significant improvement in terms of savings

Correctness check passed: both produce 3 windows

Running 50000 iterations each...

Old (format keys each call): 805.3 us/call
New (pre-computed keys):      55.5 us/call
Speedup: 14.51x
Saved per call: 749.8 us
Saved per minute (12 ticks): 8998 us

--- Worst case: 0 active windows ---
Old: 902.3 us/call
New: 29.6 us/call
Speedup: 30.51x

--- Best case for old code: all 48 active ---
Old: 1236.4 us/call
New: 377.9 us/call
Speedup: 3.27x

ad 384 strings.. yes, but in my changes these are calculated just once, not every 5 secs. But completely understand and agree that until window order question is not resolved, this makes no sense to merge or cherrypick (as the order may change things a lot)

and was thinking a bit about that order etc.. given we have windows already filtered out by start == 0, then we can probably do something like

  windows = sorted(self.windows(), key=lambda w: w.start)

  # Last scheduled — was a full loop, now just:
  if windows:
      start = windows[-1].start
      stop = windows[-1].stop

  # Current + next window — was 2 separate loops, now one:
  for i, w in enumerate(windows):
      if self.acquire_control() and now in w:
          current_window = w
          # Next window is just the next item (if adjacent in time)
          if i + 1 < len(windows):
              next_window_save_start = w.stop + timedelta(seconds=1)
              if next_window_save_start in windows[i + 1]:
                  next_window = windows[i + 1]
          break

then windows[-1] is the last one scheduled.

@realdognose
Copy link
Collaborator

realdognose commented Feb 6, 2026

Having a 14.5x improvement sounds awesome - but when you look at performance gains, you would need to look at the overall picture as well:

As you noted, we are talking us here. You calculated for example 749us as a Speedup: 14.51x - but that's a very isolated view on only this method. To really identify this change's impact, consider the following:

  • Control Loop is running every 5 seconds.
  • On my gx MK 1, that loop takes between 20 and 30 ms in total for regular operation.
INFO:dynamicess:Control loop took 0.021248 seconds.
INFO:dynamicess:Control loop took 0.025785 seconds.

So, out of 5000ms computation time the gx has available, the dess-control_loop is consuming 30ms. (0.6%, effective 0.3% cerbo load on a dual core device). Improving that by 1000us will bring that down to 29ms (0.58% /0.29%) So, that's a gain of 0.02% (0.01%) additional computing time beieng available for other things.

Appreciate your contribution and will keep the change in mind for whenever that part needs to be touched logically anyway - but it's just not significant enough to consider it an urgent action point.

And a thought from another view:
I'm even considering to not change the generator-issue. Why? What we have now is a constant runtime. It takes the same, no matter if slot 0 or slot 48 would be active. With a real generator, we would have a linear increasing runtime, based on the active slot. So, it'll obfuscate the "real" weight of code for regular operation and in case of issues (connectivity loss) the load would increase over time. Not significant at all as we noted, but growing behaviour like that is a quite common source of issues. 99% systems running "ideal" and fine, and on 1% you could see load / timing issues. Hard to trace down then and identify that this is caused by a piece of code running less efficient for certain environment conditions that are not obviously related.

ps.: Overall the best would be to put the active (and next) window in memory and re-evaluate with a changelistener or when the active window expires. That'll be a decreasing load when connectivity is lost, cause no change-listener would fire, just the 15 min window exhaust lookup - but after all still to insignificant to be priorized over other development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants