Description
I am running into limitations of the current design of the run
loop: Let's assume I am using a custom policy that internally stores the history of past observations and actions. The questions is what is the intended way how the policy can retrieve these information. In the current implementation it seems the push!(hook, stage, policy, env)
interface is supposed to be used for custom code, whereas the push!(policy, stage, env[, action])
is supposed to be used internally by RLCore, is this correct? However, the Hook
s do not receive the action
.
Here are some ideas to address this issue:
-
I think all calls to
push!(agent::Agent, stage::AbstractStage, env[, action])
should not only be used to store information in theTrajectory
(which currently is the case) but also forward to the policy by callingpush!(agent.policy, stage, env[, action])
. This would allow custom policies to add custom logic. -
Similarly to
push(policy, ::PostActStage, env, action)
having anaction
argument,push!(hook, ::PostActStage, policy, env)
could also have an action argument added. This would also allow to use the chosen action within a custom policy hook. -
Another hook should be added between
plan!
andact!
to evaluate functions that need the currentenv
state and theaction
that is being executed. In thePreActStage
the action is not known yet and in thePostActState
the env is already in the next state, so this is currently not possible. The interior loop could look something like this:push!(policy, PreActStage(), env) optimise!(policy, PreActStage()) push!(hook, PreActStage(), policy, env) action = RLBase.plan!(policy, env) push!(policy, PostPlanStage(), env, action) # new optimise!(policy, PostPlanStage()) # new push!(hook, PostPlanStage(), policy, env, action) # new act!(env, action) push!(policy, PostActStage(), env, action) optimise!(policy, PostActStage()) push!(hook, PostActStage(), policy, env, action) # action arg new
I can open a pull request it that's an approach you want to follow.