You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Automatically calculate obs_dim based on foresight, unique_obs_dim, ... (#708)
<!--
SPDX-FileCopyrightText: ASSUME Developers
SPDX-License-Identifier: AGPL-3.0-or-later
-->
## Description
Currently obs_dim, foresight, unique_obs_dim are fixed values for a
given strategy. As they are directly interdependent, obs_dim is now
calculated based on the other ones. Adjustments should now only need
less changes per strategy.
Also correcting some wrong values in the doc strings and removing the
reward scaling in RenewableEnergyLearningSingleBidStrategy (as already
done in EnergyLearningStrategy).
## Checklist
- [x] Documentation updated (docstrings, READMEs, user guides, inline
comments, `doc` folder updates etc.)
- [x] New unit/integration tests added (if applicable)
- [x] Changes noted in release notes (if any)
- [x] Consent to release this PR's code under the GNU Affero General
Public License v3.0
---------
Co-authored-by: kim-mskw <[email protected]>
f"All observation dimensions must be the same for all RL agents. The defined learning strategies have the following observation dimensions: {obs_dim_list}"
288
+
f"All foresight values must be the same for all RL agents. The defined learning strategies have the following foresight values: {foresight_list}"
f"All observation dimensions must be the same for all RL agents. The defined learning strategies have the following observation dimensions: {obs_dim_list}"
318
+
)
319
+
else:
320
+
self.obs_dim=obs_dim_list[0]
321
+
312
322
defcreate_actors(self) ->None:
313
323
"""
314
324
Create actor networks for reinforcement learning for each unit strategy.
# Note: These scaling factors could be interpreted as information leakage. However as we are in a simulation environment and not a purley forecasting setting
121
+
# we assume that the agent has access to this information already
Copy file name to clipboardExpand all lines: docs/source/learning.rst
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -140,8 +140,8 @@ The Actor
140
140
We will explain the way learning works in ASSUME starting from the interface to the simulation, namely the bidding strategy of the power plants.
141
141
The bidding strategy, per definition in ASSUME, defines the way we formulate bids based on the technical restrictions of the unit.
142
142
In a learning setting, this is done by the actor network which maps the observation to an action. The observation thereby is managed and collected by the units operator as
143
-
summarized in the following picture. As you can see in the current working version, the observation space contains a residual load forecast for the next 24 hours and a price
144
-
forecast for 24 hours, as well as the current capacity of the power plant and its marginal costs.
143
+
summarized in the following picture. As you can see in the current working version, the observation space contains a residual load forecast and a price
144
+
forecast for example for the next 24 hours, as well as the current capacity of the power plant and its marginal costs.
0 commit comments