Skip to content

Commit 0a83f7e

Browse files
committed
fix: update Actor-Critic RL example for modern Gymnasium API
Fixes keras-team/keras#21092 Changes: - Replace `gym` with `gymnasium` (OpenAI Gym is no longer maintained) - Update `CartPole-v0` to `CartPole-v1` (v0 was removed) - Fix `env.reset()` to properly unpack (observation, info) tuple - Fix state conversion by explicitly casting to numpy float32 array - Update `env.step()` to handle `terminated`/`truncated` separately - Update reward threshold from 195 to 475 for CartPole-v1 - Remove duplicate `env.reset(seed=seed)` call - Update `Last modified` date
1 parent 1d1ac36 commit 0a83f7e

File tree

1 file changed

+10
-10
lines changed

1 file changed

+10
-10
lines changed

examples/rl/actor_critic_cartpole.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Title: Actor Critic Method
33
Author: [Apoorv Nandan](https://twitter.com/NandanApoorv)
44
Date created: 2020/05/13
5-
Last modified: 2024/02/22
5+
Last modified: 2026/02/28
66
Description: Implement Actor Critic Method in CartPole environment.
77
Accelerator: NONE
88
Converted to Keras 3 by: [Sitam Meur](https://github.com/sitamgithub-MSIT)
@@ -11,7 +11,7 @@
1111
"""
1212
## Introduction
1313
14-
This script shows an implementation of Actor Critic method on CartPole-V0 environment.
14+
This script shows an implementation of Actor Critic method on CartPole-V1 environment.
1515
1616
### Actor Critic Method
1717
@@ -26,7 +26,7 @@
2626
Agent and Critic learn to perform their tasks, such that the recommended actions
2727
from the actor maximize the rewards.
2828
29-
### CartPole-V0
29+
### CartPole-V1
3030
3131
A pole is attached to a cart placed on a frictionless track. The agent has to apply
3232
force to move the cart. It is rewarded for every time step the pole
@@ -45,7 +45,7 @@
4545
import os
4646

4747
os.environ["KERAS_BACKEND"] = "tensorflow"
48-
import gym
48+
import gymnasium as gym
4949
import numpy as np
5050
import keras
5151
from keras import ops
@@ -57,8 +57,7 @@
5757
gamma = 0.99 # Discount factor for past rewards
5858
max_steps_per_episode = 10000
5959
# Adding `render_mode='human'` will show the attempts of the agent
60-
env = gym.make("CartPole-v0") # Create the environment
61-
env.reset(seed=seed)
60+
env = gym.make("CartPole-v1") # Create the environment
6261
eps = np.finfo(np.float32).eps.item() # Smallest number such that 1.0 + eps != 1.0
6362

6463
"""
@@ -98,12 +97,12 @@
9897
episode_count = 0
9998

10099
while True: # Run until solved
101-
state = env.reset()[0]
100+
state, _ = env.reset(seed=seed)
102101
episode_reward = 0
103102
with tf.GradientTape() as tape:
104103
for timestep in range(1, max_steps_per_episode):
105104

106-
state = ops.convert_to_tensor(state)
105+
state = ops.convert_to_tensor(np.array(state, dtype=np.float32))
107106
state = ops.expand_dims(state, 0)
108107

109108
# Predict action probabilities and estimated future rewards
@@ -116,7 +115,8 @@
116115
action_probs_history.append(ops.log(action_probs[0, action]))
117116

118117
# Apply the sampled action in our environment
119-
state, reward, done, *_ = env.step(action)
118+
state, reward, terminated, truncated, _ = env.step(action)
119+
done = terminated or truncated
120120
rewards_history.append(reward)
121121
episode_reward += reward
122122

@@ -176,7 +176,7 @@
176176
template = "running reward: {:.2f} at episode {}"
177177
print(template.format(running_reward, episode_count))
178178

179-
if running_reward > 195: # Condition to consider the task solved
179+
if running_reward > 475: # Condition to consider the task solved
180180
print("Solved at episode {}!".format(episode_count))
181181
break
182182
"""

0 commit comments

Comments
 (0)