Design of run loop and hooks

I am running into limitations of the current design of the `run` loop: Let's assume I am using a custom policy that internally stores the history of past observations and actions. The questions is what is the intended way how the policy can retrieve these information. In the current implementation it seems the `push!(hook, stage, policy, env)` interface is supposed to be used for custom code, whereas the `push!(policy, stage, env[, action])` is supposed to be used internally by RLCore, is this correct? However, the `Hook`s do not receive the `action`.

Here are some ideas to address this issue:

1. I think all calls to `push!(agent::Agent, stage::AbstractStage, env[, action])` should not only be used to store information in the `Trajectory`  (which currently is the case) but also forward to the policy by calling `push!(agent.policy, stage, env[, action])`. This would allow custom policies to add custom logic.

2. Similarly to `push(policy, ::PostActStage, env, action)` having an `action` argument, `push!(hook, ::PostActStage, policy, env)` could also have an action argument added. This would also allow to use the chosen action within a custom policy hook.

3. Another hook should be added between `plan!` and `act!` to evaluate functions that need the current `env` state and the `action` that is being executed. In the `PreActStage` the action is not known yet and in the `PostActState` the env is already in the next state, so this is currently not possible. The interior loop could look something like this:
   ```julia
   push!(policy, PreActStage(), env)
   optimise!(policy, PreActStage())
   push!(hook, PreActStage(), policy, env)
   
   action = RLBase.plan!(policy, env)
   
   push!(policy, PostPlanStage(), env, action)          # new
   optimise!(policy, PostPlanStage())                   # new
   push!(hook, PostPlanStage(), policy, env, action)    # new
   
   act!(env, action)
   
   push!(policy, PostActStage(), env, action)
   optimise!(policy, PostActStage())
   push!(hook, PostActStage(), policy, env, action)     # action arg new
   ```

I can open a pull request it that's an approach you want to follow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Design of run loop and hooks #1090

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Design of run loop and hooks #1090

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions