Skip to content

fuse classify and bitset building in decoder (-9.62%)#1

Merged
friendlymatthew merged 1 commit into
mainfrom
fuse-classify
Mar 26, 2026
Merged

fuse classify and bitset building in decoder (-9.62%)#1
friendlymatthew merged 1 commit into
mainfrom
fuse-classify

Conversation

@friendlymatthew
Copy link
Copy Markdown
Owner

classify took ~20% of decode() time, with most of that spent in Vec::push writing classified vectors to the heap one at a time. A bit suprising since classification is a set of cheap NEON table lookups, but I guess the per-element push overhead added up

This PR processes 64 byte chunks directly, classifying 4 16 byte subchunks and immediately building all 3 bitsets from the results

Screenshot 2026-03-25 at 11 20 06 PM Screenshot 2026-03-25 at 11 19 57 PM

@friendlymatthew friendlymatthew merged commit 5c55e67 into main Mar 26, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant