Skip to content

btf: NewHandle should gracefully handle unsupported types #895

Open
@lmb

Description

@lmb

BTF has been extended over time to support new metadata, but older kernels refuse to load such extended BTF. This is a problem when loading a program that has been compiled by a recent version of clang on an old kernel, so libbpf modifies the loaded BTF on the fly so that it is accepted by the kernel. We've copied this behaviour for one specific case: 5.6 allows specifying the linkage of a function (static, global, etc.). On kernels < 5.6 we replace all function linkage with static when loading BTF into the kernel via NewHandle. In the future I want to extend NewHandle to cover more kernel incompatibilities, like libbpf does (e.g. #713).

The rest of this issue explores the implications this has on my proposed API to create BTF blobs from Go:

spec := NewSpec()
id, err := spec.Add(fooType)
handle, err := NewHandle(spec)

Why skipping types is hard

From the commit that introduces TYPE_TAG to pahole:

[$ ~] cat t.c
#define __tag1 __attribute__((btf_type_tag("tag1")))
#define __tag2 __attribute__((btf_type_tag("tag2")))
int __tag1 * __tag1 __tag2 *g __attribute__((section(".data..percpu")));
[$ ~] clang -O2 -g -c t.c
[$ ~] pahole -JV t.o
Found per-CPU symbol 'g' at address 0x0
Found 1 per-CPU variables!
File t.o:
[1] TYPE_TAG tag1 type_id=5
[2] TYPE_TAG tag2 type_id=1
[3] PTR (anon) type_id=2
[4] TYPE_TAG tag1 type_id=6
[5] PTR (anon) type_id=4
[6] INT int size=4 nr_bits=32 encoding=SIGNED
search cu 't.c' for percpu global variables.
Variable 'g' from CU 't.c' at address 0x0 encoded
[7] VAR g type=3 linkage=1
[8] DATASEC .data..percpu size=8 vlen=1
		type=7 offset=0 size=8

You can see for the source:

  int __tag1 * __tag1 __tag2 *g __attribute__((section(".data..percpu")));

the following type chain is generated:

  var -> ptr -> tag2 -> tag1 -> ptr -> tag1 -> int

Stripping out TYPE_TAG means we need to reallocate IDs for PTR, INT, etc. and we also need to adjust the chain. I can't see a way how to do this with the proposed API since callers of Spec.Add will have put the IDs in syscall args, etc. already.

The libbpf way: replace types instead of stripping them

libbpf replaces types instead of removing them outright. This avoids having to reallocate IDs. Here are the current rules:

  • VAR / DECL_TAG replaced with INT.
  • TYPE_TAG becomes CONST
  • DATASEC replaced with STRUCT
  • FUNC_PROTO becomes ENUM
  • FUNC becomes TYPEDEF
  • ENUM64 replaced with UNION

This is very simple, and has the benefit of being "battle tested". If we stick to the same rules as libbpf we won't break additional things. Replacing DATASEC with STRUCT seems like a no brainer. I'm not a huge fan of the TYPE_TAG, DECL_TAG, FUNC_PROTO replacements. It'd be better if we didn't have to include those types in the first place.

Idea: delay allocating type IDs

There is a way to skip types during marshaling without reallocating IDs by simply delaying ID allocation until the last possible moment, during marshaling. It's easier to explain via some pseudo code:

s := NewSpec()
id, err := s.Add(foo) // Allocates an ID for foo only!
handle, err := NewHandle(s) // Allocates IDs for children of foo

The call to Add allocates an ID for foo, which the caller can use in syscalls, etc. NewHandle marshals foo and all of it's children. If a child is not supported by the kernel we can "simply" skip encoding it, since we know that the user can't refer to the child via ID. (Skipping TYPE_TAG will still be annoying since it's part of a chain, but oh well.)

What happens if we passed a type to Spec.Add that the kernel doesn't understand? In this case NewHandle will return an error instead of silently skipping the type. This is because Spec.Add makes a guarantee to the user: if NewHandle succeeds the returned type ID is valid.

A side effect of this proposal is that Spec.TypeByName, etc. won't find children of foo. I think that is an improvement over my current implementation. If there is a use case we can add AddAll or similar later.

Downsides:

  • There are now two places that allocate IDs: Add and NewHandle
  • Marshaling has to deal with skipped types. The worst case is probably TYPE_TAG.
  • Overall, more complex code for a cleaner API

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions