computed-goto interpreter: Prevent the compiler from merging `DISPATCH` calls

- [Work-in-progress test branch here](https://github.com/python/cpython/compare/main...nelhage:cpython:computed-goto-nomerge)

As a reminder: when compiling the interpreter using computed gotos, we emit a separate copy of the "dispatch" jump (currently [`goto *opcode_targets[opcode]`](https://github.com/python/cpython/blob/1feaecc2bc42cb96f2d7bdc8d7083171bdcd0929/Python/ceval_macros.h#L98)) for each opcode implementation. The goal here -- as opposed to dispatching once at the top of the loop, and jumping to the top of the loop after each instruction -- is to expose more information to the hardware branch predictor (specifically, I believe, the branch **target** predictor, which tries to guess the destination of indirect branches).

However, the C compiler doesn't know that! It does, however, see that we've given it a C function containing a few hundred instances of identical `DISPATCH` code … and may choose to merge them together, replacing one or more instances with a jump into the tail end of a different opcode, thus undoing our careful work!

I suspect we can find a way to prevent the compiler from merging these branches, and thus restore similar branch-target-prediction behavior as in the tail-call interpreter.

The branch above attempts to prevent this merging by adding an empty `__asm__ volatile` "optimization barrier" to each call site, in the hopes that the compiler treats this as opaque and refuses to merge them. I'm far from certain this is the best approach, but in my testing it seems to be a speedup -- I see 1.03x [on `pyperformance` on my machine](https://gist.github.com/nelhage/bd99284233d3e5a56f177388e078bd9a) (`goto-mine` is the linked branch with the optimization barrier).

We can also observe the merging a bit more directly by counting the number of `DISPATCH` sites that remain after optimization, according to debug symbols:

## `main`
```
$ objdump -S --disassemble=_PyEval_EvalFrameDefault Python/ceval.o | grep -cF 'DISPATCH()'
47
```
## `nelhage/computed-goto-nomerge`
```
$ objdump -S --disassemble=_PyEval_EvalFrameDefault Python/ceval.o | grep -cF 'DISPATCH()'
306
```

For comparison, `generated_cases.c.h` contains 227 instances on the same commit.


### Linked PRs
* gh-132295
* gh-132530

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

computed-goto interpreter: Prevent the compiler from merging `DISPATCH` calls #129987

`main`

`nelhage/computed-goto-nomerge`

Linked PRs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

computed-goto interpreter: Prevent the compiler from merging DISPATCH calls #129987

Description

main

nelhage/computed-goto-nomerge

Linked PRs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

computed-goto interpreter: Prevent the compiler from merging `DISPATCH` calls #129987

`main`

`nelhage/computed-goto-nomerge`