Skip to content

computed-goto interpreter: Prevent the compiler from merging DISPATCH calls #129987

Closed
@nelhage

Description

@nelhage

As a reminder: when compiling the interpreter using computed gotos, we emit a separate copy of the "dispatch" jump (currently goto *opcode_targets[opcode]) for each opcode implementation. The goal here -- as opposed to dispatching once at the top of the loop, and jumping to the top of the loop after each instruction -- is to expose more information to the hardware branch predictor (specifically, I believe, the branch target predictor, which tries to guess the destination of indirect branches).

However, the C compiler doesn't know that! It does, however, see that we've given it a C function containing a few hundred instances of identical DISPATCH code … and may choose to merge them together, replacing one or more instances with a jump into the tail end of a different opcode, thus undoing our careful work!

I suspect we can find a way to prevent the compiler from merging these branches, and thus restore similar branch-target-prediction behavior as in the tail-call interpreter.

The branch above attempts to prevent this merging by adding an empty __asm__ volatile "optimization barrier" to each call site, in the hopes that the compiler treats this as opaque and refuses to merge them. I'm far from certain this is the best approach, but in my testing it seems to be a speedup -- I see 1.03x on pyperformance on my machine (goto-mine is the linked branch with the optimization barrier).

We can also observe the merging a bit more directly by counting the number of DISPATCH sites that remain after optimization, according to debug symbols:

main

$ objdump -S --disassemble=_PyEval_EvalFrameDefault Python/ceval.o | grep -cF 'DISPATCH()'
47

nelhage/computed-goto-nomerge

$ objdump -S --disassemble=_PyEval_EvalFrameDefault Python/ceval.o | grep -cF 'DISPATCH()'
306

For comparison, generated_cases.c.h contains 227 instances on the same commit.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)performancePerformance or resource usagetype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions