no training with SAC when random_timesteps > 0

So far when I tried to use anything else than 0 for random_timesteps for SAC to train a Go2 robot, I do not see any robots in the training videos or plots in tensorboard. In sac.py there is line: 

        # sample random actions
        # TODO, check for stochasticity
        if timestep < self._random_timesteps:
            return self.policy.random_act({"states": self._state_preprocessor(states)}, role="policy")

        # sample stochastic actions
        with torch.autocast(device_type=self._device_type, enabled=self._mixed_precision):
            actions, _, outputs = self.policy.act({"states": self._state_preprocessor(states)}, role="policy")

        return actions, None, outputs

and in base.py for the model there is this for random_act: 

       def random_act(
          self, inputs: Mapping[str, Union[torch.Tensor, Any]], role: str = ""
           ) -> Tuple[torch.Tensor, None, Mapping[str, Union[torch.Tensor, Any]]]:
           """Act randomly according to the action space

        :param inputs: Model inputs. The most common keys are:

                       - ``"states"``: state of the environment used to make the decision
                       - ``"taken_actions"``: actions taken by the policy for the given states
        :type inputs: dict where the values are typically torch.Tensor
        :param role: Role play by the model (default: ``""``)
        :type role: str, optional

        :raises NotImplementedError: Unsupported action space

        :return: Model output. The first component is the action to be taken by the agent
        :rtype: tuple of torch.Tensor, None, and dict
        """
        # discrete action space (Discrete)
        if isinstance(self.action_space, gymnasium.spaces.Discrete):
            return torch.randint(self.action_space.n, (inputs["states"].shape[0], 1), device=self.device), None, {}
        # continuous action space (Box)
        elif isinstance(self.action_space, gymnasium.spaces.Box):
            if self._random_distribution is None:
                self._random_distribution = torch.distributions.uniform.Uniform(
                    low=torch.tensor(self.action_space.low[0], device=self.device, dtype=torch.float32),
                    high=torch.tensor(self.action_space.high[0], device=self.device, dtype=torch.float32),
                )

            return (
                self._random_distribution.sample(sample_shape=(inputs["states"].shape[0], self.num_actions)),
                None,
                {},
            )
        else:
            raise NotImplementedError(f"Action space type ({type(self.action_space)}) not supported")

I am currently using IsaacLab 2.10 and IsaacSim 4.5.0. I do not know what exactly is the issue here but I think it has something to do with these lines of code I have provided


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

no training with SAC when random_timesteps > 0 #3137

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

no training with SAC when random_timesteps > 0 #3137

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions