Hi
I have a few questions about your paper.
- for example, in the case of the task of acquiring a log, you say that after getting close to the tree by directly executing the code, the log is acquired by the reinforcement learning phase. How does reinforcement learning start from there? I understand codes by Slow agent are inserted into action space. But I don't know how RL is set except for action space. Is the situation before the reinforcement learning is started inherited? How is the environment reset? In other words, I would like to know about the beginning and the end of reinforcement learning.
- you say that reinforcement learning is used for subactions that are determined to be learned by the slow agent, but who determines the configuration of that reinforcement learning, the Slow Agent or the Fast Agent?
- It seems to me that there are no instructions in the slow agent prompts in Appendix to determine whether the code should be executed or learned directly. Would it be possible to publish the prompts?
Hi
I have a few questions about your paper.