Skip to content

Commit 5a6b217

Browse files
vmoenssvekars
andauthored
Use DETERMINISTIC sampling in PPO (#3230)
Co-authored-by: Svetlana Karslioglu <[email protected]>
1 parent e6fc189 commit 5a6b217

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

intermediate_source/reinforcement_ppo.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -639,7 +639,7 @@
639639
# number of steps (1000, which is our ``env`` horizon).
640640
# The ``rollout`` method of the ``env`` can take a policy as argument:
641641
# it will then execute this policy at each step.
642-
with set_exploration_type(ExplorationType.MEAN), torch.no_grad():
642+
with set_exploration_type(ExplorationType.DETERMINISTIC), torch.no_grad():
643643
# execute a rollout with the trained policy
644644
eval_rollout = env.rollout(1000, policy_module)
645645
logs["eval reward"].append(eval_rollout["next", "reward"].mean().item())

0 commit comments

Comments
 (0)