Skip to content

Commit d726c28

Browse files
committed
adapt tutorial
1 parent fdbdbbc commit d726c28

File tree

1 file changed

+13
-11
lines changed

1 file changed

+13
-11
lines changed

docs/src/tutorial.md

+13-11
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,14 @@ for any other extension.
1313
For a new learner you need to implement the functions
1414
```
1515
update!(learner, buffer) # returns nothing
16-
selectaction(learner, policy, state) # returns an action
16+
defaultpolicy(learner, actionspace, buffer) # returns a policy
1717
defaultbuffer(learner, environment, preprocessor) # returns a buffer
1818
```
1919

2020
Let's assume you want to implement plain, simple Q-learning (you don't need to
2121
do this; it is already implemented. Your file `qlearning.jl` could contain
2222
```julia
23-
import ReinforcementLearning: update!, selectaction, defaultbuffer, Buffer
23+
import ReinforcementLearning: update!, defaultpolicy, defaultbuffer, Buffer
2424

2525
struct MyQLearning
2626
Q::Array{Float64, 2} # number of actions x number of states
@@ -36,8 +36,8 @@ function update!(learner::MyQLearning, buffer)
3636
Q[a, s] += learner.alpha * (r + maximum(Q[:, snext]) - Q[a, s])
3737
end
3838

39-
function selectaction(learner::MyQLearning, policy, state)
40-
selectaction(policy, learner.Q[:, state])
39+
function defaultpolicy(learner::MyQLearning, actionspace, buffer)
40+
EpsilonGreedyPolicy(.1, actionspace, s -> getvalue(learner.params, s))
4141
end
4242

4343
function defaultbuffer(learner::MyQLearning, environment, preprocessor)
@@ -46,10 +46,10 @@ function defaultbuffer(learner::MyQLearning, environment, preprocessor)
4646
Buffer(statetype = typeof(processedstate), capacity = 2)
4747
end
4848
```
49-
The function `defaultbuffer` gets called during the construction of an
50-
`RLSetup`. It returns a buffer that is filled with states, actions and rewards
51-
during interaction with the environment. Currently there are three types of
52-
Buffers implemented
49+
The functions `defaultpolicy` and `defaultbuffer` get called during the
50+
construction of an `RLSetup`. `defaultbuffer` returns a buffer that is filled
51+
with states, actions and rewards during interaction with the environment.
52+
Currently there are three types of Buffers implemented
5353
```julia
5454
import ReinforcementLearning: Buffer, EpisodeBuffer, ArrayStateBuffer
5555
?Buffer
@@ -65,7 +65,7 @@ reset!(environment) # returns state
6565

6666
Optionally you may also implement the function
6767
```
68-
plotenv(environment, state, action, reward, done)
68+
plotenv(environment)
6969
```
7070

7171
Please have a look at the
@@ -82,9 +82,11 @@ preprocess(preprocessor, reward, state, done) # returns a preprocessed (state, r
8282
```
8383

8484
## Policies
85+
Policies are function-like objects. To implement for example a policy that
86+
returns (the action) `42` for every possible input `state` one could write
8587
```
86-
selectaction(policy, values) # returns an action
87-
getactionprobabilities(policy, state) # Returns a normalized (1-norm) vector with non-negative entries.
88+
struct MyPolicy end
89+
(p::MyPolicy)(state) = 42
8890
```
8991

9092
## Callbacks

0 commit comments

Comments
 (0)