[WIP] [perf] Add fast desktop entry parser, ThorVG icons, and sync loading#2267
[WIP] [perf] Add fast desktop entry parser, ThorVG icons, and sync loading#2267markg85 wants to merge 1 commit intodavatorium:nextfrom
Conversation
Performance improvements: - AVX2 SIMD-optimized desktop entry parser (~32 bytes/cycle) - ThorVG for icon rendering (~50μs per icon vs ms with gdk-pixbuf) - Pre-built icon path cache for O(1) lookups - Removed async icon loading (threading overhead exceeds work at ~50μs) Note: Optimized for SSD/NVMe storage
|
interesting that thorvg is so much faster loading images. That could be interesting even with the async loading. If I test it on my machine here (i7-13700H) it pops up at the same speed (~0.2s). But on a fresh loads indeed feel slightly faster as the icons pops in slightly quicker (compared to rofi on redraw-fixes branch). The drun parser optimization here saves around 1ms (4->3ms), the cache (0.12ms) still out performs it with significant margin. Therefor I am not sure I completely understand this remark:
To bad it is one massive patch, with hardcoded assumptions in there. Not sure this will work well for lot of people. (on a side note, your fork does not build, the benchmark-drun.c is missing) |
Performance improvements:
Note: Optimized for SSD/NVMe storage
Hi,
I've been using rofi for a while and i like it. However, i noticed that icons sometimes took a fraction of time to load (showing an empty spot, then the icon popping in).
I would've never debugged this as i don't like c and don't like GTK either and would've just used it as is. However, these days we have AI and i happen to be interested in performance optimizations! That makes it bearable to dive into a repository and see how it can be improved. for those wondering about the AI model i used. I used GLM 5 with Claude Code.
Turns out it could be improved truly massively. Which this single commit fork is doing!
And yes, this was in release mode with debug symbols (-g flag).
Moving away from gdk-pixbuf to ThorVG is a direct massive speedup for loading images. I was honestly surprised by this, it's a rather massive difference. That alone makes an individual icon load within a mean time of ~0.5ms on my pc. It does differ per png icon and complexity. What you see in the benchmark is just the first icon it found which was from alacritty. ThorVG supports loading of a few formats including png so with that in mind i just removed gdk-pixbuf entirely.
This single optimization wasn't enough though. I wanted the top 10 images to load with 1ms (all of them! so 0.1ms per image). Turns out that the icon lookup path was quite heavy too so i optimized that too. Essentially by building a hashmap as lookup.
With these two changes (and a lot of micro optimizations like reducing allocations, custom desktop file parsing, avx2, etc...) the loading was now so incredibly fast that the use of async was turning into a UI glitch. Or put differently, you could still see a result where the icon wasn't loaded but popped in the next frame. I solved this by simply removing the async path and going for sync. I do realize that this change can potentially be bad too as it might block ui based in icon loading. So for the HDD users, probably don't use this patch. But if you use an SSD or NVMe then you're fast enough to not notice the difference. To me at least the UI feels snappier then it ever did (and it was already quite good!).
Lastly, i also removed the caching mechanism completely. Having a fast cache is great! Having no cache that is even faster is just better :) In my case my rofi top 10 was takes ~300ms to load (hence a short hiccup was visible) to now on my pc just ~1.3ms (that's the list and ui for the list).
For the Rofi project. I doubt these changes are all going to be merged as-is. I'd recommend to take this as inspiration and cherry pick parts of if for a future version. Like changing to ThorVG is essentially a free speed boost.