- This implementation should handle encoding and decoding as modular actions on programs - the prediction of the next derivation based on the current program should be modular/separate from encoding/decoding - the loss should allow for end-to-end training of all modules