btf: NewHandle should gracefully handle unsupported types

BTF has been extended over time to support new metadata, but older kernels refuse to load such extended BTF. This is a problem when loading a program that has been compiled by a recent version of clang on an old kernel, so `libbpf` modifies the loaded BTF on the fly so that it is accepted by the kernel. We've copied this behaviour for one specific case: 5.6 allows specifying the linkage of a function (static, global, etc.). On kernels < 5.6 we replace all function linkage with `static` when loading BTF into the kernel via `NewHandle`. In the future I want to extend `NewHandle` to cover more kernel incompatibilities, like libbpf does (e.g. #713).

The rest of this issue explores the implications this has on my proposed API to create BTF blobs from Go:

```go
spec := NewSpec()
id, err := spec.Add(fooType)
handle, err := NewHandle(spec)
```

# Why skipping types is hard

From the commit that introduces TYPE_TAG to `pahole`:

```
[$ ~] cat t.c
#define __tag1 __attribute__((btf_type_tag("tag1")))
#define __tag2 __attribute__((btf_type_tag("tag2")))
int __tag1 * __tag1 __tag2 *g __attribute__((section(".data..percpu")));
[$ ~] clang -O2 -g -c t.c
[$ ~] pahole -JV t.o
Found per-CPU symbol 'g' at address 0x0
Found 1 per-CPU variables!
File t.o:
[1] TYPE_TAG tag1 type_id=5
[2] TYPE_TAG tag2 type_id=1
[3] PTR (anon) type_id=2
[4] TYPE_TAG tag1 type_id=6
[5] PTR (anon) type_id=4
[6] INT int size=4 nr_bits=32 encoding=SIGNED
search cu 't.c' for percpu global variables.
Variable 'g' from CU 't.c' at address 0x0 encoded
[7] VAR g type=3 linkage=1
[8] DATASEC .data..percpu size=8 vlen=1
		type=7 offset=0 size=8

You can see for the source:

  int __tag1 * __tag1 __tag2 *g __attribute__((section(".data..percpu")));

the following type chain is generated:

  var -> ptr -> tag2 -> tag1 -> ptr -> tag1 -> int
```

Stripping out TYPE_TAG means we need to reallocate IDs for PTR, INT, etc. and we also need to adjust the chain. I can't see a way how to do this with the proposed API since callers of `Spec.Add` will have put the IDs in syscall args, etc. already.

## The libbpf way: replace types instead of stripping them

`libbpf` replaces types instead of removing them outright. This avoids having to reallocate IDs. Here are the current rules:

* VAR / DECL_TAG replaced with INT.
* TYPE_TAG becomes CONST
* DATASEC replaced with STRUCT
* FUNC_PROTO becomes ENUM
* FUNC becomes TYPEDEF
* ENUM64 replaced with UNION

This is very simple, and has the benefit of being "battle tested". If we stick to the same rules as `libbpf` we won't break additional things. Replacing DATASEC with STRUCT seems like a no brainer. I'm not a huge fan of the TYPE_TAG, DECL_TAG, FUNC_PROTO replacements. It'd be better if we didn't have to include those types in the first place.

## Idea: delay allocating type IDs

There is a way to skip types during marshaling without reallocating IDs by simply delaying ID allocation until the last possible moment, during marshaling. It's easier to explain via some pseudo code:

```go
s := NewSpec()
id, err := s.Add(foo) // Allocates an ID for foo only!
handle, err := NewHandle(s) // Allocates IDs for children of foo
```

The call to `Add` allocates an ID for `foo`, which the caller can use in syscalls, etc. `NewHandle` marshals `foo` and all of it's children. If a child is not supported by the kernel we can "simply" skip encoding it, since we know that the user can't refer to the child via ID. (Skipping TYPE_TAG will still be annoying since it's part of a chain, but oh well.)

What happens if we passed a type to `Spec.Add` that the kernel doesn't understand? In this case `NewHandle` will return an error instead of silently skipping the type. This is because `Spec.Add` makes a guarantee to the user: if `NewHandle` succeeds the returned type ID is valid.

A side effect of this proposal is that `Spec.TypeByName`, etc. won't find children of `foo`. I think that is an improvement over my current implementation. If there is a use case we can add `AddAll` or similar later.

Downsides:
* There are now two places that allocate IDs: `Add` and `NewHandle`
* Marshaling has to deal with skipped types. The worst case is probably TYPE_TAG.
* Overall, more complex code for a cleaner API


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

btf: NewHandle should gracefully handle unsupported types #895

Why skipping types is hard

The libbpf way: replace types instead of stripping them

Idea: delay allocating type IDs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

btf: NewHandle should gracefully handle unsupported types #895

Description

Why skipping types is hard

The libbpf way: replace types instead of stripping them

Idea: delay allocating type IDs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions