Performance: optimize DynamicEss schedule window construction#20
Performance: optimize DynamicEss schedule window construction#20tmeinlschmidt wants to merge 1 commit intovictronenergy:masterfrom
Conversation
Replace repeated string formatting with precomputed key lookups in the 5second timer hot path. - Add _SlotKeys namedtuple and _DESS_KEYS tuple for schedule settings keys, eliminating 384 str.format() calls per tick - Rewrite windows() to skip 7 remaining lookups for empty slots (start == 0) using early-exit - Merge max-schedule and current-window loops into single pass
|
@realdognose, I need you to look at this one for me. Specifically check the early exit thing. I recall that I had to chance that, a long time ago, because the VRM guys wants to know the timestamp of the last available window, and it is designed so that the windows aren't necessarily in order. In any case, you are better suited to look at this now. |
|
@izak I had a look at the suggested changes: Yes, currently The optimization about 384 format calls is no optimization (as long as the generator-issue exists). The change is less readable but currently would do 384 format calls :-) So, until we have clarified the "in-order" question for Node-Red there's nothing to pull from this PR currently. |
|
tried to mock up and do some measurement with code results (running on my m1) and tried on rpi4 (8g of memory), more significant improvement in terms of savings ad 384 strings.. yes, but in my changes these are calculated just once, not every 5 secs. But completely understand and agree that until window order question is not resolved, this makes no sense to merge or cherrypick (as the order may change things a lot) and was thinking a bit about that order etc.. given we have windows already filtered out by then |
|
Having a 14.5x improvement sounds awesome - but when you look at performance gains, you would need to look at the overall picture as well: As you noted, we are talking
So, out of 5000ms computation time the gx has available, the dess-control_loop is consuming 30ms. (0.6%, effective 0.3% cerbo load on a dual core device). Improving that by 1000us will bring that down to 29ms (0.58% /0.29%) So, that's a gain of 0.02% (0.01%) additional computing time beieng available for other things. Appreciate your contribution and will keep the change in mind for whenever that part needs to be touched logically anyway - but it's just not significant enough to consider it an urgent action point. And a thought from another view: ps.: Overall the best would be to put the active (and next) window in memory and re-evaluate with a changelistener or when the active window expires. That'll be a decreasing load when connectivity is lost, cause no change-listener would fire, just the 15 min window exhaust lookup - but after all still to insignificant to be priorized over other development. |
Replace repeated string formatting with precomputed key lookups in the 5second timer hot path.