-
Notifications
You must be signed in to change notification settings - Fork 146
Description
Required prerequisites
- I have read the documentation https://omnisafe.readthedocs.io.
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
Questions
Hello :),
I designed an environment with gymnasium with an actionspace defined as:
self.action_space = gym.spaces.Box(
low=np.array([0.0, -1.0], dtype=np.float32),
high=np.array([1.0, 1.0], dtype=np.float32)
).
Then I'm using the code from the tutorial to make a CMDP out of it. While doing evaluations after training, I realized that the actions used are not within my bounds (shown in the picture). When printing the actionspace size it tells me the correct size. In another issue I found a fix with self._env = ActionScale(self._env,device=DEVICE_CPU, low=-1,high=1), but the problem stll reamains. For trying
self._env = ActionScale(
self._env,
device=DEVICE_CPU,
low=np.array([0.0, -1.0], dtype=np.float32),
high=np.array([1.0, 1.0], dtype=np.float32),
), I'm running into problems with the data types, but I would already be happy when low=-1 and high=1 works.
I have no idea where the origin of the problem could be.
Another thing I noticed is that the actions are deviating very little from one timestep to the next one, creating a pretty flat and continuous curve. When I used Stablebaselines, the actions seemed to be way more random and changed a lot (both similar training lengths). I just wondered if there is a reason for that :)
Thank you very much in advance :))
!