|
1 | | -# Practical_RL |
2 | | -** Announce - new HSE track will start in late january, YSDA soon after. Tons of changes incoming. We'll also fix all the issues :) ** |
3 | 1 |
|
4 | | -A course on reinforcement learning in the wild. |
| 2 | +# Practical_RL [](https://mybinder.org/v2/gh/yandexdataschool/practical_rl/spring19) |
| 3 | +An open course on reinforcement learning in the wild. |
5 | 4 | Taught on-campus at [HSE](https://cs.hse.ru) and [YSDA](https://yandexdataschool.com/) and maintained to be friendly to online students (both english and russian). |
6 | 5 |
|
| 6 | +__Note:__ this branch is an on-campus version of the for __spring 2019 YSDA and HSE students__. For full course materials, switch to the [master branch](https://github.com/yandexdataschool/Practical_RL/tree/master). |
| 7 | + |
7 | 8 |
|
8 | 9 | #### Manifesto: |
9 | 10 | * __Optimize for the curious.__ For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper. |
10 | 11 | * __Practicality first.__ Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem. |
11 | 12 | * __Git-course.__ Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! [Pull-request](https://help.github.com/articles/about-pull-requests/) it! |
12 | 13 |
|
| 14 | +[](https://github.com/yandexdataschool/Practical_RL/graphs/contributors) |
| 15 | + |
13 | 16 | # Course info |
14 | | -* Lecture slides are [here](https://yadi.sk/d/loPpY45J3EAYfU). |
15 | | -* Telegram chat room for YSDA & HSE students is [here](https://t.me/rlspring18) |
16 | | -* Grading rules for YSDA & HSE students is [here](https://github.com/yandexdataschool/Practical_RL/wiki/Homeworks-and-grading) |
17 | | -* Online student __[survival guide](https://github.com/yandexdataschool/Practical_RL/wiki/Online-student's-survival-guide)__ |
18 | | -* Installing the libraries - [guide and issues thread](https://github.com/yandexdataschool/Practical_RL/issues/1) |
19 | | -* Magical button that launches you into course environment: |
20 | | - * [](https://mybinder.org/v2/gh/yandexdataschool/Practical_RL/master) - comes with all libraries pre-installed. May be down time to time. |
21 | | - * If it's down, try [__google colab__](https://colab.research.google.com/) or [__azure notebooks__](http://notebooks.azure.com/). Those last longer, but they will require you to run installer commands (see ./Dockerfile). |
22 | | -* Anonymous [feedback form](https://docs.google.com/forms/d/e/1FAIpQLSdurWw97Sm9xCyYwC8g3iB5EibITnoPJW2IkOVQYE_kcXPh6Q/viewform) for everything that didn't go through e-mail. |
23 | | -* [About the course](https://github.com/yandexdataschool/Practical_RL/wiki/Practical-RL) |
| 17 | +* __Chat room__ for YSDA & HSE students is [here](https://t.me/joinchat/CDFcMVcoAQvEiI9WAo1pEQ) |
| 18 | +* __Grading__ rules for YSDA & HSE students is [here](https://github.com/yandexdataschool/Practical_RL/wiki/Homeworks-and-grading) |
| 19 | + |
| 20 | +* __FAQ:__ [About the course](https://github.com/yandexdataschool/Practical_RL/wiki/Practical-RL), [Technical issues thread](https://github.com/yandexdataschool/Practical_RL/issues/1), [Lecture Slides](https://yadi.sk/d/loPpY45J3EAYfU), [Online Student Survival Guide](https://github.com/yandexdataschool/Practical_RL/wiki/Online-student's-survival-guide) |
| 21 | + |
| 22 | +* Anonymous [feedback form](https://docs.google.com/forms/d/e/1FAIpQLSdurWw97Sm9xCyYwC8g3iB5EibITnoPJW2IkOVQYE_kcXPh6Q/viewform). |
| 23 | + |
| 24 | +* Virtual course environment: |
| 25 | + * [Installing dependencies](https://github.com/yandexdataschool/Practical_RL/issues/1) on your local machine (recommended). |
| 26 | + * [__google colab__](https://colab.research.google.com/) - set open -> github -> yandexdataschool/pracical_rl -> {branch name} and select any notebook you want. |
| 27 | + * Alternatives: [](https://mybinder.org/v2/gh/yandexdataschool/practical_rl/spring19) and [Azure Notebooks](https://notebooks.azure.com/). |
| 28 | + |
24 | 29 |
|
25 | 30 | # Additional materials |
26 | | -* A large list of RL materials - [awesome rl](https://github.com/aikorea/awesome-rl) |
27 | 31 | * [RL reading group](https://github.com/yandexdataschool/Practical_RL/wiki/RL-reading-group) |
28 | 32 |
|
29 | 33 |
|
30 | 34 | # Syllabus |
31 | 35 |
|
32 | 36 | The syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks. |
33 | 37 |
|
34 | | -* [__week1__](https://github.com/yandexdataschool/Practical_RL/tree/master/week1_intro) RL as blackbox optimization |
| 38 | +* [__week01_intro__](./week01_intro) Introduction |
35 | 39 | * Lecture: RL problems around us. Decision processes. Stochastic optimization, Crossentropy method. Parameter space search vs action space search. |
36 | 40 | * Seminar: Welcome into openai gym. Tabular CEM for Taxi-v0, deep CEM for box2d environments. |
37 | 41 | * Homework description - see week1/README.md. |
38 | | - * **YSDA Deadline: 2018.02.26 23.59** |
39 | | - * **HSE Deadline: 2018.01.28 23:59** |
40 | | - |
41 | | -* [__week2__](https://github.com/yandexdataschool/Practical_RL/tree/master/week2_value_based) Value-based methods |
| 42 | + |
| 43 | +* [__week02_value_based__](./week02_value_based) Value-based methods |
42 | 44 | * Lecture: Discounted reward MDP. Value-based approach. Value iteration. Policy iteration. Discounted reward fails. |
43 | 45 | * Seminar: Value iteration. |
44 | 46 | * Homework description - see week2/README.md. |
45 | | - * **HSE Deadline: 2018.02.11 23:59** |
46 | | - * **YSDA Deadline: part1 2018.03.05 23.59, part2 2018.03.12 23.59** |
47 | 47 |
|
48 | | - |
49 | | -* [__week3__](https://github.com/yandexdataschool/Practical_RL/tree/master/week3_model_free) Model-free reinforcement learning |
| 48 | +* [__week03_model_free__](./week03_model_free) Model-free reinforcement learning |
50 | 49 | * Lecture: Q-learning. SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. TD(Lambda). |
51 | 50 | * Seminar: Qlearning Vs SARSA Vs Expected Value SARSA |
52 | 51 | * Homework description - see week3/README.md. |
53 | | - * **HSE Deadline: 2018.02.15 23:59** |
54 | | - * **YSDA Deadline: 2018.03.12 23.59** |
55 | | - |
56 | | -* [__week4_recap__](https://github.com/yandexdataschool/Practical_RL/tree/master/week4_%5Brecap%5D_deep_learning) - deep learning recap |
57 | | - * Lecture: Deep learning 101 |
58 | | - * Seminar: Simple image classification with convnets |
59 | | - |
60 | | -* [__week4__](https://github.com/yandexdataschool/Practical_RL/tree/master/week4_approx_rl) Approximate reinforcement learning |
61 | | - * Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick; experience replay, target networks, double/dueling/bootstrap DQN, etc. |
62 | | - * Seminar: Approximate Q-learning with experience replay. (CartPole, Atari) |
63 | | - * **HSE Deadline: 2018.03.04 23:30** |
64 | | - * **YSDA Deadline: 2018.03.20 23.30** |
65 | | - |
66 | | -* [__week5__](https://github.com/yandexdataschool/Practical_RL/tree/master/week5_explore) Exploration in reinforcement learning |
67 | | - * Lecture: Contextual bandits. Thompson Sampling, UCB, bayesian UCB. Exploration in model-based RL, MCTS. "Deep" heuristics for exploration. |
68 | | - * Seminar: bayesian exploration for contextual bandits. UCB for MCTS. |
69 | | - |
70 | | - * **YSDA Deadline: 2018.03.30 23.30** |
71 | | - |
72 | | -* [__week6__](https://github.com/yandexdataschool/Practical_RL/tree/master/week6_policy_based) Policy gradient methods I |
73 | | - * Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance reduction(baseline), advantage actor-critic (incl. GAE) |
74 | | - * Seminar: REINFORCE, advantage actor-critic |
75 | | - |
76 | | -* [__week7_recap__](https://github.com/yandexdataschool/Practical_RL/tree/master/week7_%5Brecap%5D_rnn) Recurrent neural networks recap |
77 | | - * Lecture: Problems with sequential data. Recurrent neural netowks. Backprop through time. Vanishing & exploding gradients. LSTM, GRU. Gradient clipping |
78 | | - * Seminar: character-level RNN language model |
79 | 52 |
|
80 | | -* [__week7__](https://github.com/yandexdataschool/Practical_RL/tree/master/week7_pomdp) Partially observable MDPs |
81 | | - * Lecture: POMDP intro. POMDP learning (agents with memory). POMDP planning (POMCP, etc) |
82 | | - * Seminar: Deep kung-fu & doom with recurrent A3C and DRQN |
83 | | - |
84 | | -* [__week8__](https://github.com/yandexdataschool/Practical_RL/tree/master/week8_scst) Applications II |
85 | | - * Lecture: Reinforcement Learning as a general way to optimize non-differentiable loss. G2P, machine translation, conversation models, image captioning, discrete GANs. Self-critical sequence training. |
86 | | - * Seminar: Simple neural machine translation with self-critical sequence training |
| 53 | +* __week04__ Approximate (deep) RL |
| 54 | +* __week05__ Exploration |
| 55 | +* __week06__ Policy Gradient methods |
| 56 | +* __week07__ Applications I |
| 57 | +* __week{++i}__ Partially Observed MDP |
| 58 | +* __week{++i}__ Advanced policy-based methods |
| 59 | +* __week{++i}__ Applications II |
| 60 | +* __week{++i}__ Distributional reinforcement learning |
| 61 | +* __week{++i}__ Inverse RL and Imitation Learning |
87 | 62 |
|
88 | | -* [__week9__](https://github.com/yandexdataschool/Practical_RL/tree/master/week9_policy_II) Policy gradient methods II |
89 | | - * Lecture: Trust region policy optimization. NPO/PPO. Deterministic policy gradient. DDPG. Bonus: DPG for discrete action spaces. |
90 | | - * Seminar: Approximate TRPO for simple robotic tasks. |
91 | | - |
92 | | -* [Some after-course bonus materials](https://github.com/yandexdataschool/Practical_RL/tree/master/yet_another_week) |
93 | | - |
94 | 63 |
|
95 | 64 | # Course staff |
96 | 65 | Course materials and teaching by: _[unordered]_ |
97 | 66 | - [Pavel Shvechikov](https://github.com/bestxolodec) - lectures, seminars, hw checkups, reading group |
98 | | -- [Oleg Vasilev](https://github.com/Omrigan) - seminars, hw checkups, technical support |
99 | | -- [Alexander Fritsler](https://github.com/Fritz449) - lectures, seminars, hw checkups |
100 | 67 | - [Nikita Putintsev](https://github.com/qwasser) - seminars, hw checkups, organizing our hot mess |
101 | | -- [Fedor Ratnikov](https://github.com/justheuristic/) - lectures, seminars, hw checkups |
102 | | -- [Alexey Umnov](https://github.com/alexeyum) - seminars, hw checkups |
| 68 | +- [Alexander Fritsler](https://github.com/Fritz449) - lectures, seminars, hw checkups |
| 69 | +- [Oleg Vasilev](https://github.com/Omrigan) - seminars, hw checkups, technical support |
| 70 | +- [Dmitry Nikulin](https://github.com/pastafarianist) - tons of fixes, far and wide |
| 71 | +- [Mikhail Konobeev](https://github.com/MichaelKonobeev) - seminars, hw checkups |
| 72 | +- [Ivan Kharitonov](https://github.com/neer201) - seminars, hw checkups |
| 73 | +- [Ravil Khisamov](https://github.com/zshrav) - seminars, hw checkups |
| 74 | +- [Fedor Ratnikov](https://github.com/justheuristic) - admin stuff |
103 | 75 |
|
104 | 76 | # Contributions |
105 | 77 | * Using pictures from [Berkeley AI course](http://ai.berkeley.edu/home.html) |
106 | 78 | * Massively refering to [CS294](http://rll.berkeley.edu/deeprlcourse/) |
107 | 79 | * Several tensorflow assignments by [Scitator](https://github.com/Scitator) |
108 | 80 | * A lot of fixes from [arogozhnikov](https://github.com/arogozhnikov) |
109 | 81 | * Other awesome people: see github [contributors](https://github.com/yandexdataschool/Practical_RL/graphs/contributors) |
| 82 | +* [Alexey Umnov](https://github.com/alexeyum) helped us a lot during spring2018 |
110 | 83 |
|
0 commit comments