Model.reset()andModel.sample()signature has changed. They no longer receiveTransitionBatchobjects, and they both return a dictionary of strings to tensors representing a model state that should be passed tosample()to simulate transitions. This dictionary can contain things like previous actions, predicted observation, latent states, beliefs, and any other such quantity that the model need to maintain to simulate trajectories when usingModelEnv.Ensembleclass and sub-classes are assumed to operate on 1-D models.- Checkpointing format used by
save()andload()in classesGaussianMLPandOneDTransitionRewardModelchanged, making old checkpoints incompatible with the new version. use_siluargument toGaussianMLPhas been replaced byactivation_fn_cfg, which is anomegaconf.DictConfigspecifying the class to use for the activation functions, thus giving more flexibility.- Removed unnecessary nesting inside
dynamics_modelHydra configuration.
- Added functions to
mbrl.util.modelsto easily create convolutional encoder/decoders with a desired configuration. mbrl.util.common.rollout_agent_trajectoriesnow allows rolling out a pixel-based environment using a policy trained on its corresponding non-pixel environment version.ModelTrainercan be givenepsforAdamoptimizer. It now also includes a progress bar usingtqdm(can be turned off).- CEM optimizer can now be toggled between using clipped normal distribution or truncated normal distribution.
mbrl.util.mujoco.make_envcan now create an environment specified via anomegaconfconfiguration andhydra.utils.instantiate, which takes precedence over the old mechanism if both are present.
- Added MPPI optimizer.
- Added iCEM optimizer.
control_env.pynow works with CEM, iCEM and MPPI.- Changed algorithm configuration so that action optimizer is passed as another config file.
- Added option to quantize pixel obs of gym mujoco and dm control env wrappers.
- Added a sequence iterator,
SequenceTransitionSampler, that always returns a fixed number of random batches.
- Methods
loss,eval_scoreandupdateofModelclass now return a tuple of loss/score and metadata. Currently, supports the old version as well, but this will be deprecated in v0.2.0. ModelTrainernow accepts a callback that will be called after every batch both during training and evaluation.Normalizerinutil.mathcan now operate using double precision. Utilities now allow specifying if replay buffer and normalizer should use double or float via Hydra config.
- Multiple bug fixes
- Added a training browser to compare results of multiple runs
- Deprecated
ReplayBuffer.get_iterators()and replaced withmbrl.util.common.get_basic_iterators() - Added an iterator that returns batches of sequences of transitions of a given length
- Multiple bug fixes
- Added
third_partyfolder forpytorch_sacanddmc2gym - Library now available in
pypi - Moved example configurations to package
mbrl.examples, which can now be run aspython -m mbrl.examples.main, afterpipinstallation
Initial release