Skip to content

Design of run loop and hooks #1090

Open
@johannes-fischer

Description

@johannes-fischer

I am running into limitations of the current design of the run loop: Let's assume I am using a custom policy that internally stores the history of past observations and actions. The questions is what is the intended way how the policy can retrieve these information. In the current implementation it seems the push!(hook, stage, policy, env) interface is supposed to be used for custom code, whereas the push!(policy, stage, env[, action]) is supposed to be used internally by RLCore, is this correct? However, the Hooks do not receive the action.

Here are some ideas to address this issue:

  1. I think all calls to push!(agent::Agent, stage::AbstractStage, env[, action]) should not only be used to store information in the Trajectory (which currently is the case) but also forward to the policy by calling push!(agent.policy, stage, env[, action]). This would allow custom policies to add custom logic.

  2. Similarly to push(policy, ::PostActStage, env, action) having an action argument, push!(hook, ::PostActStage, policy, env) could also have an action argument added. This would also allow to use the chosen action within a custom policy hook.

  3. Another hook should be added between plan! and act! to evaluate functions that need the current env state and the action that is being executed. In the PreActStage the action is not known yet and in the PostActState the env is already in the next state, so this is currently not possible. The interior loop could look something like this:

    push!(policy, PreActStage(), env)
    optimise!(policy, PreActStage())
    push!(hook, PreActStage(), policy, env)
    
    action = RLBase.plan!(policy, env)
    
    push!(policy, PostPlanStage(), env, action)          # new
    optimise!(policy, PostPlanStage())                   # new
    push!(hook, PostPlanStage(), policy, env, action)    # new
    
    act!(env, action)
    
    push!(policy, PostActStage(), env, action)
    optimise!(policy, PostActStage())
    push!(hook, PostActStage(), policy, env, action)     # action arg new

I can open a pull request it that's an approach you want to follow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions