You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
fix critical bugs in MAPolicy and docs update (#207)
- fix a bug in MAPolicy: `buffer.rew = Batch()` doesn't change `buffer.rew` (thanks mypy)
- polish examples/box2d/bipedal_hardcore_sac.py
- several docs update
- format setup.py and bump version to 0.2.7
Copy file name to clipboardExpand all lines: README.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,7 +36,7 @@ Here is Tianshou's other features:
36
36
- Elegant framework, using only ~2000 lines of code
37
37
- Support parallel environment simulation (synchronous or asynchronous) for all algorithms [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html#parallel-sampling)
38
38
- Support recurrent state representation in actor network and critic network (RNN-style training for POMDP) [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html#rnn-style-training)
39
-
- Support any type of environment state (e.g. a dict, a self-defined class, ...) [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html#user-defined-environment-and-different-state-representation)
39
+
- Support any type of environment state/action (e.g. a dict, a self-defined class, ...) [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html#user-defined-environment-and-different-state-representation)
40
40
- Support customized training process [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html#customize-training-process)
41
41
- Support n-step returns estimation and prioritized experience replay for all Q-learning based algorithms; GAE, nstep and PER are very fast thanks to numba jit function and vectorized numpy operation
42
42
- Support multi-agent RL [Usage](https://tianshou.readthedocs.io/en/latest/tutorials/cheatsheet.html##multi-agent-reinforcement-learning)
@@ -74,8 +74,8 @@ $ pip install tianshou
74
74
After installation, open your python console and type
75
75
76
76
```python
77
-
import tianshouas ts
78
-
print(ts.__version__)
77
+
import tianshou
78
+
print(tianshou.__version__)
79
79
```
80
80
81
81
If no error occurs, you have successfully installed Tianshou.
Copy file name to clipboardExpand all lines: docs/index.rst
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,11 +24,11 @@ Welcome to Tianshou!
24
24
Here is Tianshou's other features:
25
25
26
26
* Elegant framework, using only ~2000 lines of code
27
-
* Support parallel environment sampling for all algorithms: :ref:`parallel_sampling`
28
-
* Support recurrent state representation in actor network and critic network (RNN-style training for POMDP): :ref:`rnn_training`
27
+
* Support parallel environment simulation (synchronous or asynchronous) for all algorithms: :ref:`parallel_sampling`
28
+
* Support recurrent state/action representation in actor network and critic network (RNN-style training for POMDP): :ref:`rnn_training`
29
29
* Support any type of environment state (e.g. a dict, a self-defined class, ...): :ref:`self_defined_env`
30
30
* Support customized training process: :ref:`customize_training`
31
-
* Support n-step returns estimation :meth:`~tianshou.policy.BasePolicy.compute_nstep_return` and prioritized experience replay for all Q-learning based algorithms
31
+
* Support n-step returns estimation :meth:`~tianshou.policy.BasePolicy.compute_nstep_return` and prioritized experience replay :class:`~tianshou.data.PrioritizedReplayBuffer` for all Q-learning based algorithms; GAE, nstep and PER are very fast thanks to numba jit function and vectorized numpy operation
32
32
* Support multi-agent RL: :doc:`/tutorials/tictactoe`
- Our default choice: remove the done flag penalty, will soon converge to \~250 reward within 100 epochs (10M env steps, 3~4 hours, see the image below)
4
+
- If the done penalty is not removed, it converges much slower than before, about 200 epochs (20M env steps) to reach the same performance (\~200 reward)
5
+
- Action noise is only necessary in the beginning. It is a negative impact at the end of the training. Removing it can reach \~255 (our best result under the original env, no done penalty removed).
0 commit comments