-
Notifications
You must be signed in to change notification settings - Fork 90
AwkwardForth documentation
The AwkwardForth language will eventually be documented in a proper place, but since @ianna will need a quick reference while developing TypedArrayBuilder, I'll write something useful here.
AwkwardForth is a subset of standard Forth with some additional built-in words. It is a domain specific language for creating columnar Awkward Arrays from record-oriented data sources, especially for cases in which the deserialization procedure for the record-oriented data is not known until runtime. Typically, this is because the data has a type or schema that is discovered at runtime and that type determines how bytes of input are interpreted and in what order. This does not apply to columnar data sources, such as Apache Arrow, Parquet, or some ROOT data, such as numerical types (like int
or float
) and jagged arrays of numbers (like std::vector<int>
). It does apply to record-oriented sources like ProtoBuf, Avro, and complex types in ROOT TTrees, such as std::vector<std::vector<int>>
or unsplit classes. Note that ROOT's new RNTuple is entirely columnar.
The Easy Forth one-page tutorial is an excellent introduction to the idea of Forth. In a nutshell, whereas functional programming strives for pure functions with no side effects, Forth operations consist purely of side effects: every operation changes the state of the machine, whether the global stack of integers, global variables, or in the case of AwkwardForth, positions in input buffers and data written to output buffers. It has almost no syntax, less even than Lisp, in that it consists entirely of whitespace-separated words interpreted in reverse Polish order. (Looping and branching constructs do have a recursive grammar, but they are exceptions.)
AwkwardForth is interpreted as a bytecode-compiled virtual machine. The source code is compiled into sequences of integer codes, one sequence per user-defined word or nested control flow (e.g. body of loops and conditional branches). This "compilation" is literal, like Python or Java bytecode—no optimization is attempted. It is interpreted by a virtual machine that (on my laptop) runs at about 5 ns per instruction. Instructions are 1‒3 bytecodes long, and each bytecode is a 32-bit integer (templated as I
in C++, but only instantiated for int32_t
). For comparison, the Python virtual machine (on the same laptop) runs at about 900 ns per instruction (see this comment), so AwkwardForth is an "interpreter" in the same sense as CPython, but almost 200× faster, due to its specialization. Strictly mathematical calculations can be much faster in compiled, optimized C++, but strictly I/O operations (from RAM to RAM) is about the same, with C++ being only 1.8× faster in the limit of one 32-bit copy per instruction. If dozens or more bytes are copied per instruction, the gap between AwkwardForth and C++ becomes insignificant. Since AwkwardForth is intended for mostly I/O purposes, this is acceptable.
Forth's emphasis on state-changing operations would make it a terrible choice for vectorized accelerators like GPUs, but an FPGA implementation could be great: FPGAs have a much longer "compilation" time than even C++, so it would be advantageous for an FPGA to be configurable by Forth programs in the same sense as AwkwardForth. Such a thing could, for instance, read ROOT files directly from GHz Ethernet into machine learning models implemented with hls4ml.
This part of the documentation is the most in flux, since we'll likely add features to the ForthMachine to make debugging easier.
In C++, there are three classes:
-
ForthMachineOf<T, I>, where
T
is the stack type (int32_t
orint64_t
) andI
is the instruction type (onlyint32_t
has been instantiated). -
ForthInputBuffer is an untyped input buffer, which wraps a
std::shared_ptr<void>
. (Note that one operation, copying multiple numbers from the input buffer to the stack (not directly to output buffers), will temporarily mutate data in the buffer if they need to be byte-swapped. This is a temporary mutation, so the buffer can be used by other functions afterward, but not at the same time as the ForthMachine. This thread-unsafety could be changed in the future.) -
ForthOutputBufferOf is a typed output buffer, specialized by
OUT
. (The fact that the write methods are virtual is not a performance bottleneck: putting the output type information into Forth bytecodes and using aswitch
statement to go to specialized method calls has identical performance for small copies and is up to 2× worse for large copies. C++ vtables are hard to beat.)
In Python, only the two instantiations of the ForthMachine are bound through pybind11:
>>> from awkward.forth import ForthMachine32
>>> from awkward.forth import ForthMachine64
The methods available in Python are a subset of the ones in C++. (The fast, lookup-by-integer methods were omitted.)
A ForthMachine compiles its source code once when it is constructed; new code requires a new machine. This machine computes the sum of 3 and 5.
>>> vm = ForthMachine32("3 5 +")
>>> vm.run()
>>> vm.stack
[8]
A ForthMachine has 3 states: "not ready," "paused," and "done." There are 6 methods that control execution of a ForthMachine:
-
run(inputs)
: resets the state of the machine, starting in any state, and runs the main code from the beginning. If control reaches apause
word, the machine goes into the "paused" state. Otherwise, it goes into the "done" state. -
begin(inputs)
: resets the state of the machine, starting in any state, and goes into a "paused" state before the first instruction in the main code. -
resume()
: starts execution from a "paused" state and continues until the end of the main code, resulting in "done," or until the end of a user-defined word, if a word was paused while being called (see below). -
call(word)
: starting from a "paused" or "done" state, executes a user-defined word. If this operation contains apause
word, the machine will need to be resumed (see above) to reach the end of the user-defined word. When the user-defined word is finished, the state of the machine will be "paused" or "done," depending on where it started. -
step()
: executes only one instruction, starting from a "pause" state, ending in a "pause" or "done" state, depending on whether the last instruction in the main code is reached. This only exists for debugging: normal pausing and resuming should be done withpause
words andresume()
calls. -
reset()
: resets the state of the machine and (unlike all of the above), clears the stack, all variables, and detaches the input and output buffers (which might be significant for cleaning up memory use).
Here are some examples of controlling the execution state of a ForthMachine.
Stepping through a program (for debugging only):
>>> vm = ForthMachine32("3 5 +")
>>> vm.begin()
>>> vm.stack
[]
>>> vm.step()
>>> vm.stack
[3]
>>> vm.step()
>>> vm.stack
[3, 5]
>>> vm.step()
>>> vm.stack
[8]
Pausing and resuming execution:
>>> vm = ForthMachine32("1 2 pause 3 4")
>>> vm.run()
>>> vm.stack
[1, 2]
>>> vm.run()
>>> vm.stack
[1, 2]
>>> vm.resume()
>>> vm.stack
[1, 2, 3, 4]
Halting execution:
>>> vm = ForthMachine32("1 2 halt 3 4")
>>> vm.run()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'user halt' in AwkwardForth runtime: user-defined error or stopping condition
>>> vm.stack
[1, 2]
>>> vm.run(raise_user_halt=False)
'user halt'
>>> vm.stack
[1, 2]
>>> vm.resume()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'not ready' in AwkwardForth runtime: call 'begin' before 'step' or 'resume' (note: check 'is_ready')
Calling a user-defined word:
>>> vm = ForthMachine32(": callme 1 2 3 4 ;")
>>> vm.call("callme")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'not ready' in AwkwardForth runtime: call 'begin' before 'step' or 'resume' (note: check 'is_ready')
>>> vm.run()
>>> vm.stack
[]
>>> vm.call("callme")
>>> vm.stack
[1, 2, 3, 4]
Interaction between pause
and calling a user-defined word:
>>> vm = ForthMachine32(": callme 123 pause 321 ; 1 2 pause 3 4")
>>> vm.run()
>>> vm.stack
[1, 2]
>>> vm.call("callme")
>>> vm.stack
[1, 2, 123]
>>> vm.resume()
>>> vm.stack
[1, 2, 123, 321]
>>> vm.resume()
>>> vm.stack
[1, 2, 123, 321, 3, 4]
Manipulating the stack outside of a program:
>>> vm = ForthMachine32("if 123 else 321 then")
>>> vm.begin()
>>> vm.stack
[]
>>> vm.stack_push(-1) # true
>>> vm.stack
[-1]
>>> vm.resume() # if pops the value and runs the first branch
>>> vm.stack
[123]
>>> vm.begin()
>>> vm.stack
[]
>>> vm.stack_push(0) # false
>>> vm.stack
[0]
>>> vm.resume() # if pops the value and runs the second branch
>>> vm.stack
[321]
AwkwardForth can also have (global, scalar) variables, (global, untyped) inputs, and (global, typed) outputs. (The language has no nested scopes.) Here is an example of a ForthMachine with a variable:
>>> vm = ForthMachine32("variable x 10 x !")
>>> vm["x"]
0
>>> vm.run()
>>> vm["x"]
10
Here is an example of a ForthMachine with an input (i->
reads data as a 4-byte integer and moves the position 4 bytes):
>>> import numpy as np
>>> vm = ForthMachine32("input x x i-> stack")
>>> vm.run({"x": np.array([3, 2, 1], np.int32)})
>>> vm.stack
[3]
>>> vm.input_position("x")
4
Here is an example of a ForthMachine with an output (<-
writes data from the stack, converting it to the output type, if necessary):
>>> vm = ForthMachine32("output x int32 999 x <- stack")
>>> vm.begin()
>>> vm.step()
>>> vm.stack
[999]
>>> vm["x"]
<NumpyArray format="i" shape="0" data="" at="0x58c8c85d11c0"/>
>>> vm.step()
>>> vm.stack
[]
>>> vm["x"]
<NumpyArray format="i" shape="1" data="999" at="0x58c8c85d11c0"/>
A ForthMachine can have an arbitrary number of variables, inputs, and outputs, and an arbitrary number of user-defined words, with index orders defined by the order of declaration (relevant for fast C++ access).
AwkwardForth has no floating-point operations at all. (If we need to add one, it would be a separate floating-point stack, which is the typical way Forth implementations handle floating-point calculations, if at all.)
The bytecode instructions for an AwkwardForth program are a ListOffsetArray of 32-bit integers, which can be inspected and decompiled.
>>> import awkward as ak
>>> vm = ForthMachine32("if 123 else 321 then")
>>> vm.bytecodes
<ListOffsetArray64>
<offsets><Index64 i="[0 3 5 7]" offset="0" length="4" at="0x58c8c859ef00"/></offsets>
<content><NumpyArray format="i" shape="7" data="4 60 61 0 123 0 321" at="0x58c8c84c9310"/></content>
</ListOffsetArray64>
>>> ak.Array(vm.bytecodes)
<Array [[4, 60, 61], [0, 123], [0, 321]] type='3 * var * int32'>
>>> print(vm.decompiled)
if
123
else
321
then
You can also get the current position in the bytecode (the position of the next instruction to be run) and a decompiled string of that instruction.
>>> vm = ForthMachine32("1 2 pause 3 4")
>>> # Literal integers in the source code are two-bytecode instructions (0 followed by the number).
>>> ak.Array(vm.bytecodes)
<Array [[0, 1, 0, 2, 2, 0, 3, 0, 4]] type='1 * var * int32'>
>>> vm.current_bytecode_position
-1
>>> vm.begin()
>>> vm.current_bytecode_position
0
>>> vm.current_instruction
'1'
>>> vm.resume()
>>> vm.current_bytecode_position
5
>>> vm.current_instruction
'3'
>>> vm.resume()
>>> vm.current_bytecode_position
-1
>>> vm.current_instruction
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: 'is done' in AwkwardForth runtime: reached the end of the program; call 'begin' to 'step' again (note: check 'is_done')
(https://github.com/scikit-hep/awkward-1.0/blob/1.0.2/src/libawkward/forth/ForthMachine.cpp#L1302)
Note that this current_bytecode_position
refers to the absolute position in bytecodes.content
, not a position relative to the beginning of a segment. The following example illustrates that, as well as the use of current_recursion_depth
(PR #653 may be required):
>>> vm = ForthMachine32("0 if 123 else 321 then")
>>> ak.to_list(vm.bytecodes)
[[0, 0, 4, 60, 61], [0, 123], [0, 321]]
>>> vm.begin()
>>> vm.current_bytecode_position, vm.current_recursion_depth, vm.current_instruction
(0, 1, '0')
>>> vm.step()
>>> vm.current_bytecode_position, vm.current_recursion_depth, vm.current_instruction
(2, 1, 'if\n 123\nelse\n 321\nthen')
>>> vm.step()
>>> vm.current_bytecode_position, vm.current_recursion_depth, vm.current_instruction
(4, 1, '(anonymous segment at 2)')
>>> vm.step()
>>> vm.current_bytecode_position, vm.current_recursion_depth, vm.current_instruction
(7, 2, '321')
>>> vm.step()
>>> vm.current_bytecode_position, vm.current_recursion_depth(-1, 1)