Skip to content

feat(lvgl_port): Used PPA for rotation on ESP32-P4 #352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 17, 2025

Conversation

espzav
Copy link
Collaborator

@espzav espzav commented Jul 22, 2024

ESP-BSP Pull Request checklist

  • Version of modified component bumped
  • CI passing

Change description

  • Enabled PPA rotation in LVGL port (LVGL9 + P4)

Closes #540

@espzav espzav force-pushed the feat/lvgl_port_ppa branch 6 times, most recently from 5186eb6 to 645f085 Compare July 22, 2024 11:49
@espzav espzav marked this pull request as draft July 26, 2024 07:17
@mpolitowicz
Copy link

@espzav Would it be possible to merge PPA based software rotation to main branch and release it ?

We have added manually the code to our project and it worked like a charm. CPU usage dropped from 80-90% to 10-20% and render/flush times from fewhundreds ms to 10-20 ms!!

We added 'lcd_ppa.c', 'lcd_ppa.h' and changes to 'esp_lvgl_port_disp.c' under LVGL_PORT_PPA defines with one adjustment (changed .buffer_size = disp_cfg->buffer_size * sizeof(lv_color_t), to .buffer_size = disp_cfg->buffer_size * color_bytes,) along with changes to CMakeLists.txt to make it work.

@mpolitowicz
Copy link

Also we had to change buffer alignment to 128 as we are using external memory and then buffer has to be aligned to L2 size
ppa_ctx->buffer_size = ALIGN_UP(cfg->buffer_size, 128); ppa_ctx->buffer = heap_caps_aligned_calloc(128, ppa_ctx->buffer_size, sizeof(uint8_t), buffer_caps);

@mpolitowicz
Copy link

The last adjustment that we had to make was to change the way how rotation area is calculated when there is full screen rotation. For unknown reason, in our case, the lv_area_t *area for whole screen refresh is one pixel smaller then should be. We are not sure if this is special case for our display, but any way we fixed it by pulling area h/w calculation from esp_lcd_ppa_rotate to lvgl_port_flush_callback and adding check if we are handling full screen refresh in following way (lcd_ppa_disp_area_t was extended with h and w ):

             uint32_t ww = lv_area_get_width(area);
             uint32_t hh = lv_area_get_height(area);
         
             uint32_t hres = lv_display_get_horizontal_resolution(drv);
             uint32_t vres = lv_display_get_vertical_resolution(drv);

             if ((hh == vres) && (ww == hres) && (rotate_cfg.area.x1 == 0) && (rotate_cfg.area.y1 == 0)) {
                 rotate_cfg.area.y2 = hh; 
                 rotate_cfg.area.x2 = ww;
                 rotate_cfg.area.h = hh;
                 rotate_cfg.area.w = ww;
             } else {
                 rotate_cfg.area.w  = rotate_cfg.area.x2 - rotate_cfg.area.x1 + 1;
                 rotate_cfg.area.h  = rotate_cfg.area.y2 - rotate_cfg.area.y1 + 1;
             }        

and then inside esp_lcd_ppa_rotate:

     const uint32_t w = rotate_cfg->area.w;
     const uint32_t h = rotate_cfg->area.h;
 
     /* Set dimension by screen size and rotation */
     uint32_t out_w = w;
     uint32_t out_h = h;

This part worked for our case but I am not sure if it is most elegant solution...

On the occasion position related variables (like x,y,h,w etc ) were set to uint32_t type to standardize it with PPA.

@espzav
Copy link
Collaborator Author

espzav commented Mar 14, 2025

Hi @mpolitowicz, thank you for you rresults and changes. We didn't merge it because there wasn't performance change between SW rotation and PPA rotation. Mainly, when there was small draw buffer, the SW rotation was faster.
I am not sure, what was changed (maybe something in LVGL), but I can try it again. Thank you

@mpolitowicz
Copy link

@espzav it seems there is no difference on smaller buffers as standard SW rotate seems to have similar speed. The huge difference (like 10 fold) can be observed with bigger buffers and we have 800x1280 to rotate...

Here is comparison of a scenario from our application: load screen no3 (full rotate 800x1280), load screen no1 (full rotate 800x1280), zoom part of screen (partial rotate 680x680) four times, load screen no4 (full rotate 800x1280), zoom part of screen (partial rotate 680x680) four times, load screen no1 (full rotate 800x1280)

Stats with PPA:
image

Stats without PPA:
image

As you see there is almost no difference when 680x680 is rotated, but there is huge boost on flush time when we rotate whole screen.

@mpolitowicz
Copy link

Oh, and regards buffer alignment change from 64 to 128, that I mentioned before, I think it would be best to take this value from CONFIG_CACHE_L2_CACHE_LINE_SIZE from sdkconfig.h

@mpolitowicz
Copy link

@espzav Any chance to move forward with this one ?
It seems that majority of the changes can be removed from this PR as it was intended to do three things (fix for FreeRTOS, SW rotation and PPA suport for rotation) out of which 2 first were included in 2.3.0. So the one that stays is PPA rotation and as I mentioned above it goes down to 3 files (2 to be changed, one added as it is).

@espzav
Copy link
Collaborator Author

espzav commented Mar 21, 2025

@espzav Any chance to move forward with this one ? It seems that majority of the changes can be removed from this PR as it was intended to do three things (fix for FreeRTOS, SW rotation and PPA suport for rotation) out of which 2 first were included in 2.3.0. So the one that stays is PPA rotation and as I mentioned above it goes down to 3 files (2 to be changed, one added as it is).

@mpolitowicz I am working on it on local now. There are some changes in lvgl_port and I must merge it.

@espzav espzav force-pushed the feat/lvgl_port_ppa branch from 645f085 to 5bd7404 Compare March 21, 2025 09:55
Copy link

github-actions bot commented Mar 21, 2025

Test Results

 19 files  19 suites   6m 17s ⏱️
 76 tests 26 ✅  50 💤 0 ❌
494 runs  26 ✅ 468 💤 0 ❌

Results for commit 6c97b21.

♻️ This comment has been updated with latest results.

@espzav espzav force-pushed the feat/lvgl_port_ppa branch 2 times, most recently from b2fce4d to e0c1d00 Compare March 21, 2025 12:54
@espzav espzav changed the title feat(lvgl_port): Used PPA for rotation on ESP32-P4 and added SW rotation feat(lvgl_port): Used PPA for rotation on ESP32-P4 Mar 21, 2025
@espzav espzav marked this pull request as ready for review March 21, 2025 13:41
@espzav espzav requested a review from tore-espressif March 21, 2025 14:12
@espzav espzav force-pushed the feat/lvgl_port_ppa branch from e0c1d00 to 63ca516 Compare March 24, 2025 13:35
@espzav
Copy link
Collaborator Author

espzav commented Mar 24, 2025

Hi @mpolitowicz and @tore-espressif, I made some LVGL benchmarks of screen rotation with PPA and without PPA. I still see no difference there.
@mpolitowicz Is there something bad? I am using bsp_display_rotate(display, LV_DISPLAY_ROTATION_90); for rotation and lv_demo_benchmark() for benchmarks. I tried internal RAM and SPIRAM too. I tried full buffer.

image

@mpolitowicz
Copy link

Hi, I did some tests and I got confused myself. The logs above are from two consecutive runs of our application with only difference in code which was commented out line #define LVGL_PORT_PPA in esp_lvgl_port_disp.c. I checked it few times and it looked like the solution to our challenges (too slow refresh on 680x680 part of UI). And that was the reason I asked for the merge.
And now once I try to reproduce it I do not see much (if any) gain...

I thought that the reason is I included in the code other change related to PPA (blending described in #409 ) but when I removed that part (in fact I did reset managed_components to be sure), the speed of UI was more less the same.

My weak guess is that enabling PPA might have helped in reducing memory consumption so that drawer task started to work faster. But I am not sure here... Any ways in mean time we optimized size of displayed items and played with LV_DRAW_THREAD_STACK_SIZE (setting it at 128k) and this did the job. Also we reduced LVGL buffer size to 680x680 (from 1280x800) and resigned from double buffer.

So anyways maybe the best is to include PPA rotation under option in Kconfig (and consider adding PPA blend/fill in the same way ) ? But I guess it is up to you as it seems we found setting that works for us without PPA support.

@espzav
Copy link
Collaborator Author

espzav commented Mar 28, 2025

@mpolitowicz Thank you for your help and great clarification. I am thinking about two posibilities. We can keep as it is and release it or add runtime API for select PPA. I don't like KConfig options in lvgl_port component. There are not any other options in KConfig for now.
@tore-espressif and @igrr What do you think? I would like to merge this PR, but still it has same results as SW rotation. I tried make more tests, I got same results.

@igrr
Copy link
Member

igrr commented Mar 28, 2025

A Kconfig option to disable PPA could be meaningful in this case, since that would allow reducing the code size. A run-time switch is more flexible, but it doesn't help us remove unused PPA-related code if the application doesn't need PPA. You can try to add both — a runtime switch and a Kconfig option to get the flexibility while making it possible to trim the code size.

@espzav espzav added this to the esp_lvgl_port 2.6.0 milestone Mar 31, 2025
@espzav espzav force-pushed the feat/lvgl_port_ppa branch from 63ca516 to 74035dc Compare March 31, 2025 13:09
@espzav
Copy link
Collaborator Author

espzav commented Mar 31, 2025

A Kconfig option to disable PPA could be meaningful in this case, since that would allow reducing the code size. A run-time switch is more flexible, but it doesn't help us remove unused PPA-related code if the application doesn't need PPA. You can try to add both — a runtime switch and a Kconfig option to get the flexibility while making it possible to trim the code size.

Thank you for your suggestion. When I had to add Kconfig, I didn't add to runtime option. I think, it is enough as a Kconfig.

@espzav
Copy link
Collaborator Author

espzav commented Apr 8, 2025

@tore-espressif PTAL

@espzav espzav force-pushed the feat/lvgl_port_ppa branch 2 times, most recently from 5820ab4 to d80e8c8 Compare April 8, 2025 09:05
Copy link
Collaborator

@tore-espressif tore-espressif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only Nitpicks, LGTM!

@espzav espzav force-pushed the feat/lvgl_port_ppa branch from d80e8c8 to 42ac13f Compare April 17, 2025 08:08
@espzav espzav force-pushed the feat/lvgl_port_ppa branch from 42ac13f to 6c97b21 Compare April 17, 2025 08:10
@espzav espzav merged commit 76cc903 into master Apr 17, 2025
39 checks passed
@espzav espzav deleted the feat/lvgl_port_ppa branch April 17, 2025 10:54
@mpolitowicz
Copy link

@espzav (and all) thank you! We tested it once again and it seems it does visibly speed up our solution. Especially when larger chunks of screen or whole screen is redrawn. Maybe the case is that we have 1280x800 screen that is large enough to benefit from PPA.

One observation that we have (at least in our case), is that in order to have LVGL_PORT_ENABLE_PPA work without issues, it is necessary to have ESP_MM_CACHE_MSYNC_C2M_CHUNKED_OPS enabled. Otherwise the screen blinks in blue from time to time when larger parts are refreshed fast (this is not very obvious and we discovered it on scrolling lv_table that is like half screen size). I am not sure what is the mechanism behind it but I suspect that since PPA uses esp_cache_msync it can lock flush cb on larger buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add PPA rotation for ESP32-P4 (BSP-657)
4 participants