Open
Description
Strategy:
- create env-wrapper that changes the observation-type to a dict that also has a "skill"-key and generate skill on the environment level; also remove the reward
- change the policies to intercept the flow between the feature extractor and Actor; it should split the observation into skill and rest pass both separatly through their respective encoder and combine the state and skill embeddings to return
- the critic should ignore the skill variable
Metadata
Metadata
Assignees
Labels
No labels