You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. Convenient high-level interfaces for applications of RL (training an implemented algorithm on a custom environment).
13
13
1. Large scope: online (on- and off-policy) and offline RL, experimental support for multi-agent RL (MARL), experimental support for model-based RL, and more
14
14
15
-
16
15
Unlike other reinforcement learning libraries, which may have complex codebases,
17
16
unfriendly high-level APIs, or are not optimized for speed, Tianshou provides a high-performance, modularized framework
18
17
and user-friendly interfaces for building deep reinforcement learning agents. One more aspect that sets Tianshou apart is its
@@ -149,9 +148,11 @@ If no errors are reported, you have successfully installed Tianshou.
149
148
150
149
## Documentation
151
150
152
-
Tutorials and API documentation are hosted on [tianshou.readthedocs.io](https://tianshou.readthedocs.io/).
151
+
Find example scripts in the [test/](https://github.com/thu-ml/tianshou/blob/master/test) and [examples/](https://github.com/thu-ml/tianshou/blob/master/examples) folders.
153
152
154
-
Find example scripts in the [test/](https://github.com/thu-ml/tianshou/blob/master/test) and [examples/](https://github.com/thu-ml/tianshou/blob/master/examples) folders.
153
+
Tutorials and API documentation are hosted on [tianshou.readthedocs.io](https://tianshou.readthedocs.io/).
154
+
**Important**: The documentation is currently being updated to reflect the changes in Tianshou v2.0.0. Not all features are documented yet, and some parts are outdated (they are marked as such). The documentation will be fully updated when
155
+
the v2.0.0 release is finalized.
155
156
156
157
## Why Tianshou?
157
158
@@ -180,20 +181,23 @@ Check out the [GitHub Actions](https://github.com/thu-ml/tianshou/actions) page
180
181
181
182
Atari and MuJoCo benchmark results can be found in the [examples/atari/](examples/atari/) and [examples/mujoco/](examples/mujoco/) folders respectively. **Our MuJoCo results reach or exceed the level of performance of most existing benchmarks.**
182
183
183
-
### Policy Interface
184
+
### Algorithm Abstraction
185
+
186
+
Reinforcement learning algorithms are build on abstractions for
187
+
188
+
- on-policy algorithms (`OnPolicyAlgorithm`),
189
+
- off-policy algorithms (`OffPolicyAlgorithm`), and
190
+
- offline algorithms (`OfflineAlgorithm`),
191
+
192
+
all of which clearly separate the core algorithm from the training process and the respective environment interactions.
184
193
185
-
All algorithms implement the following, highly general API:
194
+
In each case, the implementation of an algorithm necessarily involves only the implementation of methods for
186
195
187
-
-`__init__`: initialize the policy;
188
-
-`forward`: compute actions based on given observations;
189
-
-`process_buffer`: process initial buffer, which is useful for some offline learning algorithms
190
-
-`process_fn`: preprocess data from the replay buffer (since we have reformulated _all_ algorithms to replay buffer-based algorithms);
191
-
-`learn`: learn from a given batch of data;
192
-
-`post_process_fn`: update the replay buffer from the learning process (e.g., prioritized replay buffer needs to update the weight);
193
-
-`update`: the main interface for training, i.e., `process_fn -> learn -> post_process_fn`.
196
+
- pre-processing a batch of data, augmenting it with necessary information/sufficient statistics for learning (`_preprocess_batch`),
197
+
- updating model parameters based on an augmented batch of data (`_update_with_batch`).
194
198
195
-
The implementation of this API suffices for a new algorithm to be applicable within Tianshou,
196
-
making experimenation with new approaches particularly straightforward.
199
+
The implementation of these methods suffices for a new algorithm to be applicable within Tianshou,
200
+
making experimentation with new approaches particularly straightforward.
197
201
198
202
## Quick Start
199
203
@@ -203,70 +207,68 @@ Tianshou provides two API levels:
203
207
- the procedural interface, which provides a maximum of control, especially for very advanced users and developers of reinforcement learning algorithms.
204
208
205
209
In the following, let us consider an example application using the _CartPole_ gymnasium environment.
206
-
We shall apply the deep Qnetwork (DQN) learning algorithm using both APIs.
210
+
We shall apply the deep Q-network (DQN) learning algorithm using both APIs.
207
211
208
212
### High-Level API
209
213
210
-
To get started, we need some imports.
211
-
212
-
```python
213
-
from tianshou.highlevel.config import SamplingConfig
214
-
from tianshou.highlevel.env import (
215
-
EnvFactoryRegistered,
216
-
VectorEnvType,
217
-
)
218
-
from tianshou.highlevel.experiment import DQNExperimentBuilder, ExperimentConfig
219
-
from tianshou.highlevel.params.policy_params import DQNParams
220
-
from tianshou.highlevel.trainer import (
221
-
EpochTestCallbackDQNSetEps,
222
-
EpochTrainCallbackDQNSetEps,
223
-
EpochStopCallbackRewardThreshold
224
-
)
225
-
```
226
-
227
214
In the high-level API, the basis for an RL experiment is an `ExperimentBuilder`
228
215
with which we can build the experiment we then seek to run.
229
216
Since we want to use DQN, we use the specialization `DQNExperimentBuilder`.
230
-
The other imports serve to provide configuration options for our experiment.
231
217
232
218
The high-level API provides largely declarative semantics, i.e. the code is
233
219
almost exclusively concerned with configuration that controls what to do
234
220
(rather than how to do it).
235
221
236
222
```python
223
+
from tianshou.highlevel.config import OffPolicyTrainingConfig
224
+
from tianshou.highlevel.env import (
225
+
EnvFactoryRegistered,
226
+
VectorEnvType,
227
+
)
228
+
from tianshou.highlevel.experiment import DQNExperimentBuilder, ExperimentConfig
229
+
from tianshou.highlevel.params.algorithm_params import DQNParams
0 commit comments