A typical interpreter consists of a loop that looks something like this:
while (1) {
switch(*pc) {
case ADD:
/* add implementation */
pc++;
break;
case SUB:
/* sub implementation */
pc++;
break;
/*
* handlers for other bytecodes
*/
}
}
That is, you have a loop which steps through bytecodes, examining each one. The switch statement then causes the appropriate thing to happen based on the bytecode.
If you try to find a code structure like this in HHVM, you won’t find it. Instead, HHVM uses a somewhat complicated combination of macros and templates. This cuts down on duplicated code (so the same fix doesn’t have to be applied in dozens of places). It also makes the code harder for newcomers to read and harder to do searches in the code (as some symbols don’t directly exist in the source - they’re built by macros).
The interpreter is located in hhvm/hphp/runtime/vm/bytecode.cpp. In that file, you’ll find dispatchImpl(). It’s declared as:
template <bool breakOnCtlFlow> TCA dispatchImpl()
This is the function that implements the interpreter. Two different versions are instantiated, one with breakOnCtlFlow set to true and one with it set to false. Since these are constants, the compiler is smart enough to elide the “if” statements that check breakOnCtlFlow. Instead, the bodies of the “if”s are either never executed or always executed. While at first glance it appears that we're paying the cost of runtime checks, that's not actually the case.
When breakOnCtlFlow is true, the function will return to the caller when a control flow (i.e. branch) instruction is encountered. If breakOnCtlFlow is false, the interpreter will continue executing instructions until a function-exiting instruction is reached. As of this writing, the function-exiting instructions are RetC, RetV, NativeImpl, Await, CreateCont, Yield, and YieldK. This list could change in the future.
There are two versions of the interpreter loop. The Windows version (indicated with _MSC_VER) implements an ordinary switch-based interpreter loop, while the gcc version implements a threaded interpreter. In a threaded interpreter, the handler for each bytecode jumps directly to the handler for the next bytecode rather than going to a single central switch statement. This eliminates a jump to a different cache line and improves branch prediction by allowing the processor’s branch predictor to find associations between the bytecodes. These different mechanisms are hidden by the DISPATCH_ACTUAL macro.
There are three separate parts to each bytecode handler. One part for dealing with PHP debugging, one part for doing code coverage, and a third part which implements the actual handler. These are contained in the OPCODE_DEBUG_BODY, OPCODE_COVER_BODY, and OPCODE_MAIN_BODY macros respectively. In the Windows case, these are just called sequentially. For gcc code, it’s a little more complicated. Threaded interpreting uses a table of labels to dispatch to the next handler. HHVM makes use of three such tables. The first is called optabDirect. It contains labels that allow jumping directly to OPCODE_MAIN_BODY. The second table is called optabCover. It contains labels that cause jumping to OPCODE_COVER_BODY. OPCODE_COVER_BODY then falls through to the OPCODE_MAIN_BODY. Lastly there is optabDbg. Its labels cause jumping to OPCODE_DEBUG_BODY which falls through to OPCODE_COVER_BODY which in turn falls through to OPCODE_MAIN_BODY.
Now, where are all these dozens of bytecode handlers? Some more compiler magic is employed here. Since these are all going to look very similar, their code is generated by macros. In several places a macro named simply O() is defined and then there’s a line that just says OPCODES. What’s happening here isn’t immediately obvious. The OPCODES macro is defined in hphp/runtime/vm/hhbc.h. This is the single, central place which contains everything you could ever want to know about each bytecode. The OPCODES macro invokes the O macro for each bytecode. So any time you want to generate code for every bytecode, you define an O macro and invoke the OPCODES macro. Any pieces of information from the table in the OPCODES macro that you don’t need can just be ignored by your macro.
// name immediates inputs outputs flags
#define OPCODES \
O(Nop, NA, NOV, NOV, NF) \
O(EntryNop, NA, NOV, NOV, NF) \
O(BreakTraceHint, NA, NOV, NOV, NF) \
O(DiscardClsRef, ONE(CAR), NOV, NOV, NF) \
O(PopC, NA, ONE(CV), NOV, NF) \
O(PopV, NA, ONE(VV), NOV, NF) \
...
There are multiple use of the OPCODES macro in bytecodes.cpp. The optabDirect, optabDebug, and optableCover are generated using it. The heart of the interpreter is constructed by defining an O macro that uses the OPCODE_DBG_BODY, OPCODE_COVER_BODY, and OPCODE_MAIN_BODY. The OPCODE_MAIN_BODY itself uses a macro called DISPATCH which uses the DISPATCH_ACTUAL macro mentioned earlier. So you have about five levels of macros involved in creating the interpreter body.
static const void *optabDirect[] = {
#define O(name, imm, push, pop, flags) \
&&Label##name,
OPCODES
#undef O
};
The OPCODES macro is also used to create a set of functions named interpOneNop(), interpOnePopA(), interpOnePopC(), etc. These functions can be used to invoke each of the individual bytecode handlers, with logging and statistics code also included. interpOneEntryPoints[] contains a table of these functions, allowing them to be called from JIT code.
The last piece to understanding the interpreter is iopRetWrapper(). These are a pair of function which are called by the OPCODE_MAIN_BODY macro. These unify the two different kinds of return values from bytecode handlers. The handlers either return void or return a TCA. Polymorphism causes the appropriate iopRetWrapper to be called. Since both iopRetWrapper functions return a TCA, the OPCODE_MAIN_BODY macro doesn’t have to see the difference. The iopRetWrapper() functions in turn call the bytecode handlers which are named iopXXXX (where XXXX is the name of the bytecode). The iopRetWrapper() functions and all the bytecode handlers are declared OPTBLD_INLINE. This is a macro that is empty for a debug build or ALWAYS_INLINE for a release build. Thus all the code gets inlined in to the interpreter loop for a release build.