Skip to content

Commit c74ab80

Browse files
Standardize on the reward functions (#86)
* prev_state -> current_state; new_state -> next_state * Docs changes
1 parent 46b04a5 commit c74ab80

6 files changed

Lines changed: 39 additions & 47 deletions

File tree

spiceaidocs/content/en/concepts/interpretations/_index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,6 @@ The interpretation is defined as a time range from `start` to `end`, with a `nam
3030

3131
Interpretations can be used to provide hints to the reward function on how to reward a time step. In the above example, when the training reaches Tuesday, the reward function author might choose to reward buys even higher based on that expert input.
3232

33-
When the action specific reward function is called, if there is an interpretation in that time range, it will be provided to the reward function in `[state].interpretations`. E.g. if an interpretation overlapped with new state then `new_state.interpretations` would contain a list of the overlapping interpretations.
33+
When the action specific reward function is called, if there is an interpretation in that time range, it will be provided to the reward function in `[state]_interpretations`. E.g. if an interpretation overlapped with new state then `next_state_interpretations` would contain a list of the overlapping interpretations.
3434

3535
Comparing Spice.ai recommendations to interpretations is also one way of testing Spice.ai recommendations against expected actions for input data.

spiceaidocs/content/en/concepts/rewards/_index.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -22,10 +22,10 @@ The reward function must assign a value to `reward` for it to be valid.
2222

2323
The following variables are available to be used in the reward function:
2424

25-
| variable | Type | Description |
26-
| ---------- | ------------------------------------------------------------------------------------- | -------------------------------------------------------------- |
27-
| prev_state | [SimpleNamespace](https://docs.python.org/3/library/types.html#types.SimpleNamespace) | The observation state when the action was taken |
28-
| new_state | [SimpleNamespace](https://docs.python.org/3/library/types.html#types.SimpleNamespace) | The observation state from directly after the action was taken |
25+
| variable | Type | Description |
26+
| ------------- | ---------------------------------------------------------------------- | --------------------------------------------------------------------- |
27+
| current_state | [dict](https://docs.python.org/3.8/library/stdtypes.html#typesmapping) | The observation state when the action was taken |
28+
| next_state | [dict](https://docs.python.org/3.8/library/stdtypes.html#typesmapping) | The observation state one granularity step after the action was taken |
2929

3030
### Example
3131

@@ -37,36 +37,36 @@ training:
3737
- reward: close_valve
3838
# Reward keeping moisture content above 25%
3939
with: |
40-
if new_state.sensors_garden_moisture > 0.25:
40+
if next_state["sensors_garden_moisture"] > 0.25:
4141
reward = 200
4242
4343
# Penalize low moisture content depending on how far the garden has dried out
4444
else:
45-
reward = -100 * (0.25 - new_state.sensors_garden_moisture)
45+
reward = -100 * (0.25 - next_state["sensors_garden_moisture"])
4646
47-
# Penalize especially heavily if the drying trend is continuing (new_state is drier than prev_state)
48-
if new_state.sensors_garden_moisture < prev_state.sensors_garden_moisture:
47+
# Penalize especially heavily if the drying trend is continuing (next_state is drier than current_state)
48+
if next_state["sensors_garden_moisture"] < current_state["sensors_garden_moisture"]:
4949
reward = reward * 2
5050
5151
- reward: open_valve_half
5252
# Reward watering when needed, more heavily if the garden is more dried out
5353
with: |
54-
if new_state.sensors_garden_moisture < 0.25:
55-
reward = 100 * (0.25 - new_state.sensors_garden_moisture)
54+
if next_state["sensors_garden_moisture"] < 0.25:
55+
reward = 100 * (0.25 - next_state["sensors_garden_moisture"])
5656
5757
# Penalize wasting water
5858
# Penalize overwatering depending on how overwatered the garden is
5959
else:
60-
reward = -50 * (new_state.sensors_garden_moisture - 0.25)
60+
reward = -50 * (next_state["sensors_garden_moisture"] - 0.25)
6161
6262
- reward: open_valve_full
6363
# Reward watering when needed, more heavily if the garden is more dried out
6464
with: |
65-
if new_state.sensors_garden_moisture < 0.25:
66-
reward = 200 * (0.25 - new_state.sensors_garden_moisture)
65+
if next_state["sensors_garden_moisture"] < 0.25:
66+
reward = 200 * (0.25 - next_state["sensors_garden_moisture")
6767
6868
# Penalize wasting water more heavily with valve fully open
6969
# Penalize overwatering depending on how overwatered the garden is
7070
else:
71-
reward = -100 * (new_state.sensors_garden_moisture - 0.25)
71+
reward = -100 * (next_state["sensors_garden_moisture"] - 0.25)
7272
```

spiceaidocs/content/en/reference/pod/_index.md

Lines changed: 10 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ Pod time, time-series and time-data related configuration is defined in the `tim
122122

123123
A list of time categories, such as `month` or `weekday` enabling the automatic creation of fields from the observation `time`. For example, by specifiying `month` the Spice.ai engine automatically creates a field in the data called `time_month_<month>` with a value calculated from the month of which that timestamp relates. This enables learning from cyclical patterns, such as monthly or daily cycles.
124124

125-
***Example***
125+
**_Example_**
126126

127127
```yaml
128128
time:
@@ -758,17 +758,15 @@ training:
758758

759759
A python code block that will be run before an action specific reward code block runs. Use this to define common variables that will be useful to reference in the specific reward code blocks.
760760

761-
Access observation state variables by specifying their fully qualified names and prefixing with `prev_state.` for the value at the previous state before the action was taken, and `new_state.` for the value of the state right after the action was taken.
762-
763761
**Example**
764762

765763
```yaml
766764
training:
767765
reward_init: |
768766
# Compute price change between previous state and this one
769767
# so it can be used in all three reward functions
770-
prev_price = prev_state.coinbase.btcusd.close
771-
new_price = new_state.coinbase.btcusd.close
768+
prev_price = current_state["coinbase_btcusd_close"]
769+
new_price = next_state["coinbase_btcusd_close"]
772770
change_in_price = new_price - prev_price
773771
rewards:
774772
- reward: buy
@@ -784,6 +782,10 @@ training:
784782
reward = 0.1
785783
```
786784

785+
### `training.reward_funcs`
786+
787+
The path to a Python file that defines the reward functions to use, instead of python code blocks.
788+
787789
### `training.rewards`
788790

789791
**Required**. Defines how to reward the Spice.ai runtime during training so that it learns to take more intelligent actions.
@@ -822,18 +824,8 @@ training:
822824

823825
### `training.rewards[*].with`
824826

825-
A python code block that needs to assign a variable to `reward` to specify which reward to give the Spice.ai agent for taking this action.
827+
If `training.reward_funcs` is defined, then this should be the name of the function defined in the python file to use for specifying which reward to give the Spice.ai agent for taking this action.
826828

827-
Access observation state variables by specifying their fully qualified names and prefixing with `prev_state.` for the value at the previous state before the action was taken, and `new_state.` for the value of the state right after the action was taken.
829+
If `training.reward_funcs` is not defined, then this is a python code block that needs to assign a variable to `reward` to specify which reward to give the Spice.ai agent for taking this action.
828830

829-
```yaml
830-
training:
831-
rewards:
832-
- reward: jump
833-
with: |
834-
# If we weren't able to jump, penalize trying to jump
835-
if new_state.game.character.height > prev_state.game.character.height:
836-
reward = 1
837-
else:
838-
reward = -1
839-
```
831+
See [Rewards]({{<ref "concepts/rewards">}}) for more information on how to define reward functions.

spiceaidocs/content/en/reference/pod/quickstarts-trader.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,8 @@ training:
7373
# Compute price change between previous state and this one
7474
# so it can be used in all three reward functions
7575
reward_init: |
76-
prev_price = prev_state.coinbase.btcusd.close
77-
new_price = new_state.coinbase_btcusd_close
76+
prev_price = current_state["coinbase_btcusd_close"]
77+
new_price = next_state["coinbase_btcusd_close"]
7878
change_in_price = new_price - prev_price
7979
8080
rewards:

spiceaidocs/content/en/reference/pod/samples-gardener.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -38,36 +38,36 @@ training:
3838
- reward: close_valve
3939
# Reward keeping moisture content above 25%
4040
with: |
41-
if new_state.sensors_garden_moisture > 0.25:
41+
if next_state["sensors_garden_moisture"] > 0.25:
4242
reward = 200
4343
4444
# Penalize low moisture content depending on how far the garden has dried out
4545
else:
46-
reward = -100 * (0.25 - new_state.sensors_garden_moisture)
46+
reward = -100 * (0.25 - next_state["sensors_garden_moisture"])
4747
48-
# Penalize especially heavily if the drying trend is continuing (new_state is drier than prev_state)
49-
if new_state.sensors_garden_moisture < prev_state.sensors_garden_moisture:
48+
# Penalize especially heavily if the drying trend is continuing (next_state is drier than current_state)
49+
if next_state["sensors_garden_moisture"] < current_state["sensors_garden_moisture"]:
5050
reward = reward * 2
5151
5252
- reward: open_valve_half
5353
# Reward watering when needed, more heavily if the garden is more dried out
5454
with: |
55-
if new_state.sensors_garden_moisture < 0.25:
56-
reward = 100 * (0.25 - new_state.sensors_garden_moisture)
55+
if next_state["sensors_garden_moisture"] < 0.25:
56+
reward = 100 * (0.25 - next_state["sensors_garden_moisture"])
5757
5858
# Penalize wasting water
5959
# Penalize overwatering depending on how overwatered the garden is
6060
else:
61-
reward = -50 * (new_state.sensors_garden_moisture - 0.25)
61+
reward = -50 * (next_state["sensors_garden_moisture"] - 0.25)
6262
6363
- reward: open_valve_full
6464
# Reward watering when needed, more heavily if the garden is more dried out
6565
with: |
66-
if new_state.sensors_garden_moisture < 0.25:
67-
reward = 200 * (0.25 - new_state.sensors_garden_moisture)
66+
if next_state["sensors_garden_moisture"] < 0.25:
67+
reward = 200 * (0.25 - next_state["sensors_garden_moisture"])
6868
6969
# Penalize wasting water more heavily with valve fully open
7070
# Penalize overwatering depending on how overwatered the garden is
7171
else:
72-
reward = -100 * (new_state.sensors_garden_moisture - 0.25)
72+
reward = -100 * (next_state["sensors_garden_moisture"] - 0.25)
7373
```

spiceaidocs/content/en/reference/pod/samples-serverops.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,8 +42,8 @@ training:
4242
reward_init: |
4343
high_cpu_usage_threshold = 10
4444
45-
cpu_usage_new = 100 - new_state.hostmetrics_cpu_usage_idle
46-
cpu_usage_prev = 100 - prev_state.hostmetrics_cpu_usage_idle
45+
cpu_usage_new = 100 - next_state["hostmetrics_cpu_usage_idle"]
46+
cpu_usage_prev = 100 - current_state["hostmetrics_cpu_usage_idle"]
4747
cpu_usage_delta = cpu_usage_new - cpu_usage_prev
4848
4949
cpu_usage_delta_abs = cpu_usage_delta

0 commit comments

Comments
 (0)