Replies: 2 comments 1 reply
-
That's interesting. I have tried using const_expr in the past and expected it to make the fixed commands in setWindow faster but it made no difference. I used a rather crude const expr array init approach by writing a simple sketch to generate the code below for me. This reported 1024 bytes less dynamic RAM allocation at compile time and I got identical performance which surprised me. I have not investigated why there was no performance change.
|
Beta Was this translation helpful? Give feedback.
-
There's no performance change most likely because the lookup table gets cached in RAM once retrieved from flash, I think in the zone reserved for code. Also at high processor speeds is difficult to notice differences, since the bottleneck is at GPIO. Even if the OR and shift operations are done at run-time, the difference is not spectacular, you've found 33% improvement when using LUT in some cases, in my current test I've found 20% improvement, I've edited the OP to include it. |
Beta Was this translation helpful? Give feedback.
-
For RAM constrained setups this alternative generates the lookup table in program memory.
Tested on T-Display-S3 with 170x320 = 106.25kB to transfer, there is a small performance decrease of 0.05 - 0.1fps, probably due to flash access being slower than RAM access. With
inline
instead ofalways_inline
there is a 2-3fps penalty, it seems the compiler is generating calls.EDIT: it seems the
constexp cset_mask
function still generates more code than a simple lookup, I could see that the performance is lower than xset_mask by decreasing the processor speed from 240MHz to 80MHz. To truly generate a compact lookup table at compile-time the array needs to be explicitly instantiated. With c++14 and and more recent is it pretty straightforward, but with c++11 there's a hard to follow template construction, like this one inspired from stackoverflow:EDIT2: removed unused parts from the template instantiation, added benchmark for cset_mask() evaluated at run-time.
With this, when hovering the cursor over the variable in VS Code you can see the start of the table, showing that it's generated at compile-time.
With the example I used, on ESP32S3 slowed down to 80MHz, the benchmarks are
Beta Was this translation helpful? Give feedback.
All reactions