Open
Description
currently multi-step TD has an incorrect parameter (JuliaReinforcementLearning/ReinforcementLearning.jl#648).
ReinforcementLearningAnIntroduction.jl/notebooks/Chapter09_Random_Walk.jl
Lines 193 to 216 in e83f540
as an example, the n
is used as the number of time steps. however it currently corresponds to the number of time steps plus one. run_once(1, α)
thus is not TD(0)
which has a time step parameter of 1, but rather a 2-step TD method. depending on how upstream is resolved an update might be needed here.
Metadata
Metadata
Assignees
Labels
No labels