Skip to content

[WIP] [perf] Add fast desktop entry parser, ThorVG icons, and sync loading#2267

Draft
markg85 wants to merge 1 commit intodavatorium:nextfrom
markg85:next
Draft

[WIP] [perf] Add fast desktop entry parser, ThorVG icons, and sync loading#2267
markg85 wants to merge 1 commit intodavatorium:nextfrom
markg85:next

Conversation

@markg85
Copy link

@markg85 markg85 commented Feb 15, 2026

Performance improvements:

  • AVX2 SIMD-optimized desktop entry parser (~32 bytes/cycle)
  • ThorVG for icon rendering (~50μs per icon vs ms with gdk-pixbuf)
  • Pre-built icon path cache for O(1) lookups
  • Removed async icon loading (threading overhead exceeds work at ~50μs)

Note: Optimized for SSD/NVMe storage


Hi,

I've been using rofi for a while and i like it. However, i noticed that icons sometimes took a fraction of time to load (showing an empty spot, then the icon popping in).

I would've never debugged this as i don't like c and don't like GTK either and would've just used it as is. However, these days we have AI and i happen to be interested in performance optimizations! That makes it bearable to dive into a repository and see how it can be improved. for those wondering about the AI model i used. I used GLM 5 with Claude Code.

Turns out it could be improved truly massively. Which this single commit fork is doing!
And yes, this was in release mode with debug symbols (-g flag).

  ┌────────┬────────────┬────────┬──────────────┐
  │ Format │ gdk-pixbuf │ thorvg │   Speedup    │
  ├────────┼────────────┼────────┼──────────────┤
  │ SVG    │ 4310 µs    │ 212 µs │ 20.3x faster │
  ├────────┼────────────┼────────┼──────────────┤
  │ PNG    │ 4249 µs    │ 142 µs │ 29.9x faster │
  └────────┴────────────┴────────┴──────────────┘

  ┌───────────────┬────────────┬───────────────────────┐
  │    Method     │    Time    │       Per Icon        │
  ├───────────────┼────────────┼───────────────────────┤
  │ nk_xdg lookup │ 550.038 ms │ 1.95 ms               │
  ├───────────────┼────────────┼───────────────────────┤
  │ cached lookup │ 0.036 ms   │ 0.000128 ms (0.13 µs) │
  ├───────────────┼────────────┼───────────────────────┤
  │ Speedup       │            │ 15,412x faster!       │
  └───────────────┴────────────┴───────────────────────┘

Moving away from gdk-pixbuf to ThorVG is a direct massive speedup for loading images. I was honestly surprised by this, it's a rather massive difference. That alone makes an individual icon load within a mean time of ~0.5ms on my pc. It does differ per png icon and complexity. What you see in the benchmark is just the first icon it found which was from alacritty. ThorVG supports loading of a few formats including png so with that in mind i just removed gdk-pixbuf entirely.

This single optimization wasn't enough though. I wanted the top 10 images to load with 1ms (all of them! so 0.1ms per image). Turns out that the icon lookup path was quite heavy too so i optimized that too. Essentially by building a hashmap as lookup.

With these two changes (and a lot of micro optimizations like reducing allocations, custom desktop file parsing, avx2, etc...) the loading was now so incredibly fast that the use of async was turning into a UI glitch. Or put differently, you could still see a result where the icon wasn't loaded but popped in the next frame. I solved this by simply removing the async path and going for sync. I do realize that this change can potentially be bad too as it might block ui based in icon loading. So for the HDD users, probably don't use this patch. But if you use an SSD or NVMe then you're fast enough to not notice the difference. To me at least the UI feels snappier then it ever did (and it was already quite good!).

Lastly, i also removed the caching mechanism completely. Having a fast cache is great! Having no cache that is even faster is just better :) In my case my rofi top 10 was takes ~300ms to load (hence a short hiccup was visible) to now on my pc just ~1.3ms (that's the list and ui for the list).

For the Rofi project. I doubt these changes are all going to be merged as-is. I'd recommend to take this as inspiration and cherry pick parts of if for a future version. Like changing to ThorVG is essentially a free speed boost.

Performance improvements:
- AVX2 SIMD-optimized desktop entry parser (~32 bytes/cycle)
- ThorVG for icon rendering (~50μs per icon vs ms with gdk-pixbuf)
- Pre-built icon path cache for O(1) lookups
- Removed async icon loading (threading overhead exceeds work at ~50μs)

Note: Optimized for SSD/NVMe storage
@DaveDavenport
Copy link
Collaborator

interesting that thorvg is so much faster loading images. That could be interesting even with the async loading.

If I test it on my machine here (i7-13700H) it pops up at the same speed (~0.2s). But on a fresh loads indeed feel slightly faster as the icons pops in slightly quicker (compared to rofi on redraw-fixes branch).
However on filebrowser on an image directory with larger images it is significantly slower (as expected because it being 'sync' now).

The drun parser optimization here saves around 1ms (4->3ms), the cache (0.12ms) still out performs it with significant margin. Therefor I am not sure I completely understand this remark:

Lastly, i also removed the caching mechanism completely. Having a fast cache is great! Having no cache that is even faster is just better :) In my case my rofi top 10 was takes ~300ms to load (hence a short hiccup was visible) to now on my pc just ~1.3ms (that's the list and ui for the list).

To bad it is one massive patch, with hardcoded assumptions in there. Not sure this will work well for lot of people.
Speed improvements are always welcome, but I am afraid (as you say) this is not directly usable.

(on a side note, your fork does not build, the benchmark-drun.c is missing)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants