Open
Description
Train seq2seq RNNs to generate syntactically valid inputs to gold code, and then use gold code as oracle to get output for generated input (if the input has correct syntax). Inside of try/except, seq2seq receives gold code as input and has to generate an input for gold code that doesn’t set off exception. Reward seq2seq +1 if no exception, -1 if exception. & add entropy (or minibatch discrimination?) to loss so it’ll have varied generations. To learn which syntax error it made, have seq2seq predict which exception it set off and lessen negative reward if prediction is correct. Maybe SL pretrain with already provided test cases (if we have any) before RL stage (entropy in RL loss will ensure RL stage stays varied).
Metadata
Metadata
Assignees
Labels
No labels