Skip to content

Commit ec476ea

Browse files
Fixed the Q-learning update rule in qlearning.md(Issue no #2649)
1 parent 23d7a5a commit ec476ea

File tree

1 file changed

+7
-3
lines changed

1 file changed

+7
-3
lines changed

chapter_reinforcement-learning/qlearning.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
```{.python .input}
23
%load_ext d2lbook.tab
34
tab.interact_select(["pytorch"])
@@ -132,9 +133,12 @@ def q_learning(env_info, gamma, num_iters, alpha, epsilon):
132133
action = e_greedy(env, Q, state, epsilon)
133134
next_state, reward, done, _ = env.step(action)
134135
135-
# Q-update:
136-
y = reward + gamma * np.max(Q[next_state,:])
137-
Q[state, action] = Q[state, action] + alpha * (y - Q[state, action])
136+
# Q-learning new update: Q(s,a) ← Q(s,a) + α[r + γ max Q(s',a') − Q(s,a)]
137+
#corrected Q -block code
138+
td_target = reward + gamma * np.max(Q[next_state, :])
139+
td_error = td_target - Q[state, action]
140+
Q[state, action] += alpha * td_error
141+
138142
139143
# Move to the next state
140144
state = next_state

0 commit comments

Comments
 (0)