-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Automatic Data Augmentation for Generalization in Reinforcement Learning describes a method Data-Regularized Actor-Critic (DrAC) that should be implemented in our framework. Their code is available here
It is a combination of input augmentations and a KL Loss between unaugmented and augmented outputs.
The key challenge is the handling of RNN states, as the manipulated state should not be forwarded (maybe enable testing this?).
Implementation should be possible almost exclusively in the definition of context and loss for rl. For RNN states, a slight adjustment might be needed in LatentCore to allow for passing not only the most recent, but a history of states.
Important: One does not have access to the full history of RNN states for a multi layer RNN, in that case we need to use / reimplement this by stacking RNNs!