Separate encoding and decoding of programs from derivation prediction

- This implementation should handle encoding and decoding as modular actions on programs
- the prediction of the next derivation based on the current program should be modular/separate from encoding/decoding
- the loss should allow for end-to-end training of all modules