Skip to content

Commit f99f8fc

Browse files
Improved naming of function to get and set underlying Markov states in RLToyEnv; improved API of image_continuous.get_image_representation() to accept epistemic_uncertainty and aleatoric_uncertainty std dev vectors to add to bar plots.
1 parent 37c8c3b commit f99f8fc

10 files changed

Lines changed: 128 additions & 127 deletions

File tree

docs/_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ mdp\_playground.envs.rl\_toy\_env.RLToyEnv
1818

1919
~RLToyEnv.__init__
2020
~RLToyEnv.close
21-
~RLToyEnv.get_augmented_state
21+
~RLToyEnv.get_markov_state
2222
~RLToyEnv.init_init_state_dist
2323
~RLToyEnv.init_reward_function
2424
~RLToyEnv.init_terminal_states

docs/_build/html/_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -712,8 +712,8 @@ <h1>mdp_playground.envs.rl_toy_env.RLToyEnv<a class="headerlink" href="#mdp-play
712712
</dd></dl>
713713

714714
<dl class="py method">
715-
<dt id="mdp_playground.envs.rl_toy_env.RLToyEnv.get_augmented_state">
716-
<code class="sig-name descname"><span class="pre">get_augmented_state</span></code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="../_modules/mdp_playground/envs/rl_toy_env.html#RLToyEnv.get_augmented_state"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#mdp_playground.envs.rl_toy_env.RLToyEnv.get_augmented_state" title="Permalink to this definition"></a></dt>
715+
<dt id="mdp_playground.envs.rl_toy_env.RLToyEnv.get_markov_state">
716+
<code class="sig-name descname"><span class="pre">get_markov_state</span></code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="../_modules/mdp_playground/envs/rl_toy_env.html#RLToyEnv.get_markov_state"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#mdp_playground.envs.rl_toy_env.RLToyEnv.get_markov_state" title="Permalink to this definition"></a></dt>
717717
<dd><p>gets underlying Markovian state of the MDP</p>
718718
</dd></dl>
719719

@@ -765,7 +765,7 @@ <h1>mdp_playground.envs.rl_toy_env.RLToyEnv<a class="headerlink" href="#mdp-play
765765
<tr class="row-even"><td><p><a class="reference internal" href="#mdp_playground.envs.rl_toy_env.RLToyEnv.close" title="mdp_playground.envs.rl_toy_env.RLToyEnv.close"><code class="xref py py-obj docutils literal notranslate"><span class="pre">close</span></code></a>()</p></td>
766766
<td><p>Override close in your subclass to perform any necessary cleanup.</p></td>
767767
</tr>
768-
<tr class="row-odd"><td><p><a class="reference internal" href="#id1" title="mdp_playground.envs.rl_toy_env.RLToyEnv.get_augmented_state"><code class="xref py py-obj docutils literal notranslate"><span class="pre">get_augmented_state</span></code></a>()</p></td>
768+
<tr class="row-odd"><td><p><a class="reference internal" href="#id1" title="mdp_playground.envs.rl_toy_env.RLToyEnv.get_markov_state"><code class="xref py py-obj docutils literal notranslate"><span class="pre">get_markov_state</span></code></a>()</p></td>
769769
<td><p>Intended to return the full augmented state which would be Markovian.</p></td>
770770
</tr>
771771
<tr class="row-even"><td><p><a class="reference internal" href="#id2" title="mdp_playground.envs.rl_toy_env.RLToyEnv.init_init_state_dist"><code class="xref py py-obj docutils literal notranslate"><span class="pre">init_init_state_dist</span></code></a>()</p></td>
@@ -837,7 +837,7 @@ <h1>mdp_playground.envs.rl_toy_env.RLToyEnv<a class="headerlink" href="#mdp-play
837837

838838
<dl class="py method">
839839
<dt id="id1">
840-
<code class="sig-name descname"><span class="pre">get_augmented_state</span></code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="../_modules/mdp_playground/envs/rl_toy_env.html#RLToyEnv.get_augmented_state"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#id1" title="Permalink to this definition"></a></dt>
840+
<code class="sig-name descname"><span class="pre">get_markov_state</span></code><span class="sig-paren">(</span><span class="sig-paren">)</span><a class="reference internal" href="../_modules/mdp_playground/envs/rl_toy_env.html#RLToyEnv.get_markov_state"><span class="viewcode-link"><span class="pre">[source]</span></span></a><a class="headerlink" href="#id1" title="Permalink to this definition"></a></dt>
841841
<dd><p>Intended to return the full augmented state which would be Markovian. (However, it’s not Markovian wrt the noise in P and R because we’re not returning the underlying RNG.) Currently, returns the augmented state which is the sequence of length “delay + sequence_length + 1” of past states for both discrete and continuous environments. Additonally, the current state derivatives are also returned for continuous environments.</p>
842842
<dl class="field-list simple">
843843
<dt class="field-odd">Returns</dt>

docs/_build/html/_modules/mdp_playground/envs/rl_toy_env.html

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -629,7 +629,7 @@ <h1>Source code for mdp_playground.envs.rl_toy_env</h1><div class="highlight"><p
629629
<span class="sd"> the reward function of the MDP, R</span>
630630
<span class="sd"> R(state, action)</span>
631631
<span class="sd"> defined as a lambda function in the call to init_reward_function() and is equivalent to calling reward_function()</span>
632-
<span class="sd"> get_augmented_state()</span>
632+
<span class="sd"> get_markov_state()</span>
633633
<span class="sd"> gets underlying Markovian state of the MDP</span>
634634
<span class="sd"> reset()</span>
635635
<span class="sd"> Resets environment state</span>
@@ -1834,9 +1834,9 @@ <h1>Source code for mdp_playground.envs.rl_toy_env</h1><div class="highlight"><p
18341834
<span class="bp">self</span><span class="o">.</span><span class="n">reward</span> <span class="o">+=</span> <span class="bp">self</span><span class="o">.</span><span class="n">term_state_reward</span> <span class="o">*</span> <span class="bp">self</span><span class="o">.</span><span class="n">reward_scale</span> <span class="c1"># Scale before or after?</span>
18351835
<span class="bp">self</span><span class="o">.</span><span class="n">logger</span><span class="o">.</span><span class="n">info</span><span class="p">(</span><span class="s1">&#39;sas</span><span class="se">\&#39;</span><span class="s1">r: &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">augmented_state</span><span class="p">[</span><span class="o">-</span><span class="mi">2</span><span class="p">])</span> <span class="o">+</span> <span class="s1">&#39; &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="n">action</span><span class="p">)</span> <span class="o">+</span> <span class="s1">&#39; &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">augmented_state</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="o">+</span> <span class="s1">&#39; &#39;</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">reward</span><span class="p">))</span>
18361836

1837-
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">curr_obs</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">reward</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">done</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_augmented_state</span><span class="p">()</span></div>
1837+
<span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">curr_obs</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">reward</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">done</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">get_markov_state</span><span class="p">()</span></div>
18381838

1839-
<div class="viewcode-block" id="RLToyEnv.get_augmented_state"><a class="viewcode-back" href="../../../_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.html#mdp_playground.envs.rl_toy_env.RLToyEnv.get_augmented_state">[docs]</a> <span class="k">def</span> <span class="nf">get_augmented_state</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
1839+
<div class="viewcode-block" id="RLToyEnv.get_markov_state"><a class="viewcode-back" href="../../../_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.html#mdp_playground.envs.rl_toy_env.RLToyEnv.get_markov_state">[docs]</a> <span class="k">def</span> <span class="nf">get_markov_state</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
18401840
<span class="sd">&#39;&#39;&#39;Intended to return the full augmented state which would be Markovian. (However, it&#39;s not Markovian wrt the noise in P and R because we&#39;re not returning the underlying RNG.) Currently, returns the augmented state which is the sequence of length &quot;delay + sequence_length + 1&quot; of past states for both discrete and continuous environments. Additonally, the current state derivatives are also returned for continuous environments.</span>
18411841

18421842
<span class="sd"> Returns</span>
@@ -2042,7 +2042,7 @@ <h1>Source code for mdp_playground.envs.rl_toy_env</h1><div class="highlight"><p
20422042

20432043
<span class="n">config</span><span class="p">[</span><span class="s2">&quot;generate_random_mdp&quot;</span><span class="p">]</span> <span class="o">=</span> <span class="kc">True</span> <span class="c1"># This supersedes previous settings and generates a random transition function, a random reward function (for random specific sequences)</span>
20442044
<span class="n">env</span> <span class="o">=</span> <span class="n">RLToyEnv</span><span class="p">(</span><span class="o">**</span><span class="n">config</span><span class="p">)</span>
2045-
<span class="n">state</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="n">env</span><span class="o">.</span><span class="n">get_augmented_state</span><span class="p">()[</span><span class="s1">&#39;curr_state&#39;</span><span class="p">])</span>
2045+
<span class="n">state</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="n">env</span><span class="o">.</span><span class="n">get_markov_state</span><span class="p">()[</span><span class="s1">&#39;curr_state&#39;</span><span class="p">])</span>
20462046
<span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">20</span><span class="p">):</span>
20472047
<span class="c1"># env.render() # For GUI</span>
20482048
<span class="n">action</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="n">action_space</span><span class="o">.</span><span class="n">sample</span><span class="p">()</span> <span class="c1"># take a #random action</span>

docs/_build/html/_sources/_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.rst.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ mdp\_playground.envs.rl\_toy\_env.RLToyEnv
1818

1919
~RLToyEnv.__init__
2020
~RLToyEnv.close
21-
~RLToyEnv.get_augmented_state
21+
~RLToyEnv.get_markov_state
2222
~RLToyEnv.init_init_state_dist
2323
~RLToyEnv.init_reward_function
2424
~RLToyEnv.init_terminal_states

docs/_build/html/genindex.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -820,7 +820,7 @@ <h2 id="G">G</h2>
820820
<td style="width: 33%; vertical-align: top;"><ul>
821821
<li><a href="_autosummary/mdp_playground.spaces.image_continuous.ImageContinuous.html#mdp_playground.spaces.image_continuous.ImageContinuous.generate_image">generate_image() (mdp_playground.spaces.image_continuous.ImageContinuous method)</a>
822822
</li>
823-
<li><a href="_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.html#id1">get_augmented_state() (mdp_playground.envs.rl_toy_env.RLToyEnv method)</a>, <a href="_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.html#mdp_playground.envs.rl_toy_env.RLToyEnv.get_augmented_state">[1]</a>
823+
<li><a href="_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.html#id1">get_markov_state() (mdp_playground.envs.rl_toy_env.RLToyEnv method)</a>, <a href="_autosummary/mdp_playground.envs.rl_toy_env.RLToyEnv.html#mdp_playground.envs.rl_toy_env.RLToyEnv.get_markov_state">[1]</a>
824824
</li>
825825
<li><a href="_autosummary/mdp_playground.spaces.image_continuous.ImageContinuous.html#id0">get_image_representation() (mdp_playground.spaces.image_continuous.ImageContinuous method)</a>, <a href="_autosummary/mdp_playground.spaces.image_continuous.ImageContinuous.html#mdp_playground.spaces.image_continuous.ImageContinuous.get_image_representation">[1]</a>
826826

docs/_build/html/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

example.py

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,7 @@ def discrete_environment_example():
7272
# The environment maintains an augmented state which contains the underlying
7373
# state used by the MDP to perform transitions and hand out rewards. We can
7474
# fetch a dict containing the augmented state and current state like this:
75-
augmented_state_dict = env.get_augmented_state()
75+
augmented_state_dict = env.get_markov_state()
7676
state = augmented_state_dict["curr_state"]
7777

7878
print(
@@ -113,7 +113,7 @@ def discrete_environment_image_representations_example():
113113
# The environment maintains an augmented state which contains the underlying
114114
# state used by the MDP to perform transitions and hand out rewards. We can
115115
# fetch a dict containing the augmented state and current state like this:
116-
augmented_state_dict = env.get_augmented_state()
116+
augmented_state_dict = env.get_markov_state()
117117
state = augmented_state_dict["curr_state"]
118118

119119
print(
@@ -122,7 +122,7 @@ def discrete_environment_image_representations_example():
122122
)
123123
action = env.action_space.sample()
124124
next_state_image, reward, done, trunc, info = env.step(action)
125-
augmented_state_dict = env.get_augmented_state()
125+
augmented_state_dict = env.get_markov_state()
126126
next_state = augmented_state_dict["curr_state"] # Underlying MDP state holds
127127
# the current discrete state.
128128
print("sars', done, image shape =", state, action, reward, next_state, done, next_state_image.shape)
@@ -161,7 +161,7 @@ def discrete_environment_diameter_image_representations_example():
161161
# The environment maintains an augmented state which contains the underlying
162162
# state used by the MDP to perform transitions and hand out rewards. We can
163163
# fetch a dict containing the augmented state and current state like this:
164-
augmented_state_dict = env.get_augmented_state()
164+
augmented_state_dict = env.get_markov_state()
165165
state = augmented_state_dict["curr_state"]
166166

167167
print(
@@ -170,7 +170,7 @@ def discrete_environment_diameter_image_representations_example():
170170
)
171171
action = env.action_space.sample()
172172
next_state_image, reward, done, trunc, info = env.step(action)
173-
augmented_state_dict = env.get_augmented_state()
173+
augmented_state_dict = env.get_markov_state()
174174
next_state = augmented_state_dict["curr_state"] # Underlying MDP state holds
175175
# the current discrete state.
176176
print("sars', done, shape =", state, action, reward, next_state, done, next_state_image.shape)
@@ -247,7 +247,7 @@ def continuous_environment_example_move_to_a_point_irrelevant_image():
247247

248248
env = RLToyEnv(**config)
249249
state = env.reset()[0]
250-
augmented_state_dict = env.get_augmented_state()
250+
augmented_state_dict = env.get_markov_state()
251251
state = augmented_state_dict["curr_state"].copy() # Underlying MDP state holds
252252
# the current continuous state.
253253

@@ -257,7 +257,7 @@ def continuous_environment_example_move_to_a_point_irrelevant_image():
257257
)
258258
action = env.action_space.sample()
259259
next_state_image, reward, done, trunc, info = env.step(action)
260-
augmented_state_dict = env.get_augmented_state()
260+
augmented_state_dict = env.get_markov_state()
261261
next_state = augmented_state_dict["curr_state"].copy() # Underlying MDP state holds
262262
# the current continuous state.
263263
print("sars', done, image shape =", state, action, reward, next_state, done, next_state_image.shape)
@@ -319,13 +319,13 @@ def grid_environment_example():
319319

320320
env = RLToyEnv(**config)
321321

322-
state = env.get_augmented_state()["augmented_state"][-1]
322+
state = env.get_markov_state()["augmented_state"][-1]
323323
actions = [[0, 1], [-1, 0], [-1, 0], [1, 0], [0.5, -0.5], [1, 2], [1, 1], [0, 1]]
324324

325325
for i in range(len(actions)):
326326
action = actions[i]
327327
next_obs, reward, done, trunc, info = env.step(action)
328-
next_state = env.get_augmented_state()["augmented_state"][-1]
328+
next_state = env.get_markov_state()["augmented_state"][-1]
329329
print("sars', done =", state, action, reward, next_state, done)
330330
state = next_state
331331

@@ -348,13 +348,13 @@ def grid_environment_example_reward_every_n_steps():
348348

349349
env = RLToyEnv(**config)
350350

351-
state = env.get_augmented_state()["augmented_state"][-1]
351+
state = env.get_markov_state()["augmented_state"][-1]
352352
actions = [[0, 1], [-1, 0], [-1, 0], [1, 0], [0.5, -0.5], [1, 2], [1, 1], [0, 1]]
353353

354354
for i in range(len(actions)):
355355
action = actions[i]
356356
next_obs, reward, done, trunc, info = env.step(action)
357-
next_state = env.get_augmented_state()["augmented_state"][-1]
357+
next_state = env.get_markov_state()["augmented_state"][-1]
358358
print("sars', done =", state, action, reward, next_state, done)
359359
state = next_state
360360

@@ -379,13 +379,13 @@ def grid_environment_image_representations_example():
379379
config["terminal_states"] = [[5, 5], [2, 3], [2, 4], [3, 3], [3, 4]]
380380
env = RLToyEnv(**config)
381381

382-
state = env.get_augmented_state()["augmented_state"][-1]
382+
state = env.get_markov_state()["augmented_state"][-1]
383383
actions = [[0, 1], [-1, 0], [-1, 0], [1, 0], [0.5, -0.5], [1, 2]]
384384

385385
for i in range(len(actions)):
386386
action = actions[i]
387387
next_obs, reward, done, trunc, info = env.step(action)
388-
next_state = env.get_augmented_state()["augmented_state"][-1]
388+
next_state = env.get_markov_state()["augmented_state"][-1]
389389
print("sars', done, image shape =", state, action, reward, next_state, done, next_obs.shape)
390390
state = next_state
391391

0 commit comments

Comments
 (0)