Skip to content

btf: lazy decoding of string table #1772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 2, 2025
Merged

btf: lazy decoding of string table #1772

merged 2 commits into from
May 2, 2025

Conversation

lmb
Copy link
Collaborator

@lmb lmb commented May 1, 2025

Most of the time in parsing vmlinux BTF is spent in constructing the essentialName -> TypeID index.

  1. We need to allocate a lot of strings
  2. We need to do a lot of hash table lookups

Replace the hash table with a "fuzzy" index, which doesn't require allocating strings. The trade-off is that lookups now become more expensive, but that is a fine trade-off to make.

core: 1
goos: linux
goarch: amd64
pkg: github.com/cilium/ebpf/btf
cpu: 13th Gen Intel(R) Core(TM) i7-1365U
                │  base.txt   │          lazy-strings.txt           │
                │   sec/op    │    sec/op     vs base               │
ParseVmlinux      27.24m ± 1%   15.49m ±  1%  -43.15% (p=0.002 n=6)
IterateVmlinux    149.9m ± 2%   132.3m ± 15%  -11.73% (p=0.041 n=6)
InspectorGadget   35.08m ± 2%   21.28m ±  5%  -39.34% (p=0.002 n=6)
geomean           52.32m        35.20m        -32.73%

                │   base.txt    │          lazy-strings.txt           │
                │     B/op      │     B/op      vs base               │
ParseVmlinux       9.969Mi ± 0%   4.960Mi ± 0%  -50.25% (p=0.002 n=6)
IterateVmlinux     34.72Mi ± 0%   31.92Mi ± 0%   -8.07% (p=0.002 n=6)
InspectorGadget   11.994Mi ± 0%   7.169Mi ± 0%  -40.23% (p=0.002 n=6)
geomean            16.07Mi        10.43Mi       -35.10%

                │   base.txt    │          lazy-strings.txt          │
                │   allocs/op   │  allocs/op   vs base               │
ParseVmlinux      146058.0 ± 0%    162.0 ± 0%  -99.89% (p=0.002 n=6)
IterateVmlinux      272.9k ± 0%   291.5k ± 0%   +6.83% (p=0.002 n=6)
InspectorGadget    155.03k ± 0%   24.30k ± 0%  -84.32% (p=0.002 n=6)
geomean             183.5k        10.47k       -94.29%

Signed-off-by: Lorenz Bauer <[email protected]>
@lmb lmb force-pushed the btf-lazy-strings branch from 601e57c to 7bf307e Compare May 1, 2025 14:41
The most common use case of a Spec is to look up a type by its name.
For this purpose we maintain a map[essentialName][]TypeID. This
requires allocating a string for each named type, which causes a
very large overhead when parsing BTF.

In reality, only a very small number of the
named types will ever be looked up. The intuition here is that a
couple of structs in the kernel contain most of the interesting
information, for example struct sk_buff.

Move as much of the cost of looking up a type by name to the actual
lookup. Instead of spending a lot of time constructing an index up
front we only maintaing an index going from the hash of a name to
a type ID.

1. We can compute the hash on a byte slice and therefore avoid
   allocating a string.
2. Storing the index as a (hash, id) tuple allows us to store it
   in a slice. Lookups are just a binary search into the index.
3. Hash collisions do not introduce additional complexity because
   types can already share the same name. At the same time the
   common case of a 1:1 mapping from name to type is fast.

Signed-off-by: Lorenz Bauer <[email protected]>
@lmb lmb force-pushed the btf-lazy-strings branch from 7bf307e to d140dfc Compare May 1, 2025 14:58
@lmb lmb marked this pull request as ready for review May 1, 2025 15:01
@lmb lmb requested a review from dylandreimerink as a code owner May 1, 2025 15:01
Copy link
Member

@dylandreimerink dylandreimerink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 cool approach with the fuzzyStringIndex. Can't find anything wrong with this.

@lmb lmb merged commit 1e9e58e into cilium:main May 2, 2025
17 checks passed
@lmb lmb deleted the btf-lazy-strings branch May 2, 2025 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants