Skip to content

[Feature Request] MaskableRecurrentPPO #101

Open
@CppMaster

Description

@CppMaster

Motivation
MaskablePPO is great for large discrete action space that has many invalid actions at each step, while RecurrentPPO is useful for the agent to has a memory of previous observations and actions taken, which improves it's decision making. Right now, we have to choose between those 2 algorithms and cannot have features of both of them, which would greatly improve agents training when both action masking and sequence processing is helpful.

Feature
MaskableRecurrentPPO - An algorithm that is a combination of MaskablePPO and RecurrentPPO. Or action masking integration to PPO and RecurrentPPO.

Metadata

Metadata

Assignees

No one assigned

    Labels

    duplicateThis issue or pull request already existsenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions