Skip to content

Add Rgb565PixelBE for big-endian SPI displays#10882

Open
okhsunrog wants to merge 2 commits intoslint-ui:masterfrom
okhsunrog:rgb565-be
Open

Add Rgb565PixelBE for big-endian SPI displays#10882
okhsunrog wants to merge 2 commits intoslint-ui:masterfrom
okhsunrog:rgb565-be

Conversation

@okhsunrog
Copy link

Summary

  • Adds Rgb565PixelBE, a new TargetPixel implementation that stores RGB565 pixels in big-endian byte order at render time
  • Eliminates the need for post-render byte-swapping when targeting SPI displays (ST7789, ILI9341, etc.) that expect MSB-first data
  • Ports the three ST7789 Pico examples to use the new type, removing their .to_be() swap loops

Motivation

Most SPI LCD controllers expect pixel data in big-endian byte order. The existing Rgb565Pixel stores data in native (little-endian) order, forcing every application to byte-swap the entire framebuffer after rendering. On memory-constrained MCUs with external RAM (e.g., ESP32 with PSRAM), this extra pass over the buffer is expensive — in my case it accounted for ~50ms per frame due to slow PSRAM bus bandwidth.

Three of the existing examples in examples/mcu-board-support (pico_st7789.rs, pico2_st7789.rs, pico2_touch_lcd_2_8.rs) already work around this with a .to_be() loop in process_line, confirming this is a common need.

Approach

Rgb565PixelBE packs R/G/B bits directly into big-endian-on-little-endian layout in from_rgb(), so no byte swap is ever needed. The blend() implementation expands the BE pixel into the same u32 intermediate representation used by Rgb565Pixel::blend(), reusing the identical single-multiply alpha blending math, then extracts directly back into BE positions. This means there is no per-pixel overhead compared to Rgb565Pixel — the cost is the same, just with different bit shuffling at the edges.

Results

On an ESP32 + PSRAM + ILI9342C (320×240) setup, switching from Rgb565Pixel + post-render swap to Rgb565PixelBE reduced frame time from 118ms to 67ms (8 → 15 FPS), with render time unchanged at ~4ms. The entire gain comes from eliminating the framebuffer copy/swap pass.

Real hardware demo project using this type: https://github.com/okhsunrog/m5core2v1-1_demo_rust

Test plan

  • Unit tests verify Rgb565PixelBE produces byte-swapped equivalents of Rgb565Pixel for all from_rgb inputs
  • Unit tests verify blend() matches Rgb565Pixel::blend() results (after byte swap) across multiple color/alpha combinations
  • Tested on real hardware (M5Stack Core2 v1.1, ESP32 + ILI9342C over SPI)
  • Verify the three ported Pico examples still build and render correctly

New TargetPixel type that stores RGB565 in big-endian byte order,
matching the format expected by SPI LCD controllers (ILI9341,
ILI9342C, ST7789, etc.) without any post-render byte swapping.

The bits are packed directly in swapped order — from_rgb() encodes
into the BE layout and blend() expands to the same u32 representation
used by Rgb565Pixel, reuses identical multiply+color math, then
extracts directly into BE positions. No swap_bytes() anywhere.
Replace Rgb565Pixel + post-render byte-swap loops with Rgb565PixelBE
in the three Pico ST7789 board support examples. The renderer now
produces pixels directly in big-endian byte order, eliminating the
per-line .to_be() conversion before DMA transfer.

Ported examples:
- pico_st7789
- pico2_st7789
- pico2_touch_lcd_2_8
@CLAassistant
Copy link

CLAassistant commented Feb 26, 2026

CLA assistant check
All committers have signed the CLA.

@tronical
Copy link
Member

That's an awesome idea. Thanks! I'd love to get @ogoffart 's input on this when he gets back. This can take 1-2 weeks. Thanks for your patience :)

@okhsunrog
Copy link
Author

It would be very nice if anyone with Raspberry Pi Pico and a suitable display could test how this PR affects FPS of the official demos

Copy link
Member

@ogoffart ogoffart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@okhsunrog
Copy link
Author

@tronical can this me merged now?

@tronical
Copy link
Member

tronical commented Mar 2, 2026

I've tried this on a pico2 with a 2.8 inch waveshare screen and I get consistently lower overall frame times (~2-3 FPS less). Measured in a release build with defmt and SLINT_DEBUG_PERFORMANCE, with full screen refresh but also lazy during the ink animation as well as (and especially then) when panning on the settings screen.

I don't really have a great explanation here. My best guess is that the individual pixel blending operations are slightly more complex, but since they're run many more times than the single le to be pass at the "end", I suspect that they add up to more - at least for line-by-line rendering.

I suggest that we offer this data type, but I'm unsure about using it by default in the there pico platform implementations.

What do you think?

@okhsunrog
Copy link
Author

@tronical I think I'll dig a bit into this and try to investigate what exactly goes differently. Overall I don't mind removing the second commit, so no changes to demos are made. Just want to test it a bit myself

@tronical
Copy link
Member

tronical commented Mar 2, 2026

Sounds good :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants