@@ -13,14 +13,14 @@ for any other extension.
13
13
For a new learner you need to implement the functions
14
14
```
15
15
update!(learner, buffer) # returns nothing
16
- selectaction (learner, policy, state ) # returns an action
16
+ defaultpolicy (learner, actionspace, buffer ) # returns a policy
17
17
defaultbuffer(learner, environment, preprocessor) # returns a buffer
18
18
```
19
19
20
20
Let's assume you want to implement plain, simple Q-learning (you don't need to
21
21
do this; it is already implemented. Your file ` qlearning.jl ` could contain
22
22
``` julia
23
- import ReinforcementLearning: update!, selectaction , defaultbuffer, Buffer
23
+ import ReinforcementLearning: update!, defaultpolicy , defaultbuffer, Buffer
24
24
25
25
struct MyQLearning
26
26
Q:: Array{Float64, 2} # number of actions x number of states
@@ -36,8 +36,8 @@ function update!(learner::MyQLearning, buffer)
36
36
Q[a, s] += learner. alpha * (r + maximum (Q[:, snext]) - Q[a, s])
37
37
end
38
38
39
- function selectaction (learner:: MyQLearning , policy, state )
40
- selectaction (policy, learner. Q[:, state] )
39
+ function defaultpolicy (learner:: MyQLearning , actionspace, buffer )
40
+ EpsilonGreedyPolicy (. 1 , actionspace, s -> getvalue ( learner. params, s) )
41
41
end
42
42
43
43
function defaultbuffer (learner:: MyQLearning , environment, preprocessor)
@@ -46,10 +46,10 @@ function defaultbuffer(learner::MyQLearning, environment, preprocessor)
46
46
Buffer (statetype = typeof (processedstate), capacity = 2 )
47
47
end
48
48
```
49
- The function ` defaultbuffer ` gets called during the construction of an
50
- ` RLSetup ` . It returns a buffer that is filled with states, actions and rewards
51
- during interaction with the environment. Currently there are three types of
52
- Buffers implemented
49
+ The functions ` defaultpolicy ` and ` defaultbuffer ` get called during the
50
+ construction of an ` RLSetup ` . ` defaultbuffer ` returns a buffer that is filled
51
+ with states, actions and rewards during interaction with the environment.
52
+ Currently there are three types of Buffers implemented
53
53
``` julia
54
54
import ReinforcementLearning: Buffer, EpisodeBuffer, ArrayStateBuffer
55
55
?Buffer
@@ -65,7 +65,7 @@ reset!(environment) # returns state
65
65
66
66
Optionally you may also implement the function
67
67
```
68
- plotenv(environment, state, action, reward, done )
68
+ plotenv(environment)
69
69
```
70
70
71
71
Please have a look at the
@@ -82,9 +82,11 @@ preprocess(preprocessor, reward, state, done) # returns a preprocessed (state, r
82
82
```
83
83
84
84
## Policies
85
+ Policies are function-like objects. To implement for example a policy that
86
+ returns (the action) ` 42 ` for every possible input ` state ` one could write
85
87
```
86
- selectaction(policy, values) # returns an action
87
- getactionprobabilities(policy, state) # Returns a normalized (1-norm) vector with non-negative entries.
88
+ struct MyPolicy end
89
+ (p::MyPolicy)(state) = 42
88
90
```
89
91
90
92
## Callbacks
0 commit comments