Skip to content

no training with SAC when random_timesteps > 0 #3137

@Yovan275

Description

@Yovan275

So far when I tried to use anything else than 0 for random_timesteps for SAC to train a Go2 robot, I do not see any robots in the training videos or plots in tensorboard. In sac.py there is line:

    # sample random actions
    # TODO, check for stochasticity
    if timestep < self._random_timesteps:
        return self.policy.random_act({"states": self._state_preprocessor(states)}, role="policy")

    # sample stochastic actions
    with torch.autocast(device_type=self._device_type, enabled=self._mixed_precision):
        actions, _, outputs = self.policy.act({"states": self._state_preprocessor(states)}, role="policy")

    return actions, None, outputs

and in base.py for the model there is this for random_act:

   def random_act(
      self, inputs: Mapping[str, Union[torch.Tensor, Any]], role: str = ""
       ) -> Tuple[torch.Tensor, None, Mapping[str, Union[torch.Tensor, Any]]]:
       """Act randomly according to the action space

    :param inputs: Model inputs. The most common keys are:

                   - ``"states"``: state of the environment used to make the decision
                   - ``"taken_actions"``: actions taken by the policy for the given states
    :type inputs: dict where the values are typically torch.Tensor
    :param role: Role play by the model (default: ``""``)
    :type role: str, optional

    :raises NotImplementedError: Unsupported action space

    :return: Model output. The first component is the action to be taken by the agent
    :rtype: tuple of torch.Tensor, None, and dict
    """
    # discrete action space (Discrete)
    if isinstance(self.action_space, gymnasium.spaces.Discrete):
        return torch.randint(self.action_space.n, (inputs["states"].shape[0], 1), device=self.device), None, {}
    # continuous action space (Box)
    elif isinstance(self.action_space, gymnasium.spaces.Box):
        if self._random_distribution is None:
            self._random_distribution = torch.distributions.uniform.Uniform(
                low=torch.tensor(self.action_space.low[0], device=self.device, dtype=torch.float32),
                high=torch.tensor(self.action_space.high[0], device=self.device, dtype=torch.float32),
            )

        return (
            self._random_distribution.sample(sample_shape=(inputs["states"].shape[0], self.num_actions)),
            None,
            {},
        )
    else:
        raise NotImplementedError(f"Action space type ({type(self.action_space)}) not supported")

I am currently using IsaacLab 2.10 and IsaacSim 4.5.0. I do not know what exactly is the issue here but I think it has something to do with these lines of code I have provided

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions