Skip to content

Commit 6e010b9

Browse files
Merge pull request #223 from yandexdataschool/spring19
merge spring19 to master
2 parents d23f09c + 0915fdd commit 6e010b9

File tree

156 files changed

+17166
-21313
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

156 files changed

+17166
-21313
lines changed

Dockerfile

Lines changed: 24 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -1,67 +1,46 @@
1-
FROM andrewosh/binder-base
2-
MAINTAINER Alexander Panin <[email protected]>
3-
USER root
1+
FROM python:3.7-slim
2+
# install the notebook package
3+
RUN pip install --no-cache --upgrade pip && \
4+
pip install --no-cache notebook
45

5-
RUN echo "deb http://archive.ubuntu.com/ubuntu trusty-backports main restricted universe multiverse" >> /etc/apt/sources.list
66
RUN apt-get -qq update
7-
8-
RUN apt-get install -y gcc-4.9 g++-4.9 libstdc++6 wget unzip
7+
# RUN apt-get install -y gcc-4.9 g++-4.9 libstdc++6 wget unzip
8+
RUN apt-get install -y gcc g++ libstdc++6 wget curl unzip git
99
RUN apt-get install -y libopenblas-dev liblapack-dev libsdl2-dev libboost-all-dev graphviz
1010
RUN apt-get install -y cmake zlib1g-dev libjpeg-dev
1111
RUN apt-get install -y xvfb libav-tools xorg-dev python-opengl python3-opengl
1212
RUN apt-get -y install swig3.0
1313
RUN ln -s /usr/bin/swig3.0 /usr/bin/swig
1414

15-
16-
USER main
1715
RUN pip install --upgrade pip==9.0.3
1816
RUN pip install --upgrade --ignore-installed setuptools #fix https://github.com/tensorflow/tensorflow/issues/622
19-
RUN pip install --upgrade sklearn tqdm nltk editdistance joblib graphviz
17+
RUN pip install --upgrade sklearn tqdm nltk editdistance joblib graphviz pandas matplotlib
2018

2119
# install all gym stuff except mujoco - it fails at "import importlib.util" (no module named util)
2220
RUN pip install --upgrade gym
2321
RUN pip install --upgrade gym[atari]
2422
RUN pip install --upgrade gym[box2d]
2523

26-
RUN pip install --upgrade http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp27-cp27mu-linux_x86_64.whl
24+
RUN pip install --upgrade https://download.pytorch.org/whl/cpu/torch-1.0.1.post2-cp37-cp37m-linux_x86_64.whl
2725
RUN pip install --upgrade torchvision
2826
RUN pip install --upgrade keras
2927
RUN pip install --upgrade https://github.com/Theano/Theano/archive/master.zip
3028
RUN pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip
3129
RUN pip install --upgrade https://github.com/yandexdataschool/AgentNet/archive/master.zip
3230
RUN pip install gym_pull
33-
RUN pip install ppaquette-gym-doom
34-
35-
36-
37-
38-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade pip==9.0.3
39-
40-
# fix https://github.com/tensorflow/tensorflow/issues/622
41-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade --ignore-installed setuptools
42-
43-
# python3: fix `GLIBCXX_3.4.20' not found - conda's libgcc blocked system's gcc-4.9 and libstdc++6
44-
RUN bash -c "conda update -y conda && source activate python3 && conda uninstall -y libgcc && source deactivate"
45-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade matplotlib numpy scipy pandas graphviz
46-
47-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade sklearn tqdm nltk editdistance joblib
48-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade --ignore-installed setuptools #fix https://github.com/tensorflow/tensorflow/issues/622
49-
50-
# install all gym stuff except mujoco - it fails at "mjmodel.h: no such file or directory"
51-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade gym
52-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade gym[atari]
53-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade gym[box2d]
54-
55-
56-
57-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade http://download.pytorch.org/whl/cu80/torch-0.3.0.post4-cp35-cp35m-linux_x86_64.whl
58-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade torchvision
59-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade keras
60-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade https://github.com/Theano/Theano/archive/master.zip
61-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip
62-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade https://github.com/yandexdataschool/AgentNet/archive/master.zip
63-
64-
#install TF after everything else not to break python3's pyglet with python2's tensorflow
65-
RUN pip install --upgrade tensorflow==1.4.0
66-
RUN /home/main/anaconda/envs/python3/bin/pip install --upgrade tensorflow==1.4.0
67-
#TODO py3 doom once it's no longer broken
31+
# RUN pip install ppaquette-gym-doom
32+
33+
# create user with a home directory
34+
ARG NB_USER
35+
ARG NB_UID
36+
ENV USER ${NB_USER}
37+
ENV HOME /home/${NB_USER}
38+
39+
RUN adduser --disabled-password \
40+
--gecos "Default user" \
41+
--uid ${NB_UID} \
42+
${NB_USER}
43+
WORKDIR ${HOME}
44+
USER ${USER}
45+
46+
RUN cd ${HOME} && git clone https://github.com/yandexdataschool/Practical_RL

README.md

Lines changed: 39 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,110 +1,83 @@
1-
# Practical_RL
2-
** Announce - new HSE track will start in late january, YSDA soon after. Tons of changes incoming. We'll also fix all the issues :) **
31

4-
A course on reinforcement learning in the wild.
2+
# Practical_RL [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/yandexdataschool/practical_rl/spring19)
3+
An open course on reinforcement learning in the wild.
54
Taught on-campus at [HSE](https://cs.hse.ru) and [YSDA](https://yandexdataschool.com/) and maintained to be friendly to online students (both english and russian).
65

6+
__Note:__ this branch is an on-campus version of the for __spring 2019 YSDA and HSE students__. For full course materials, switch to the [master branch](https://github.com/yandexdataschool/Practical_RL/tree/master).
7+
78

89
#### Manifesto:
910
* __Optimize for the curious.__ For all the materials that aren’t covered in detail there are links to more information and related materials (D.Silver/Sutton/blogs/whatever). Assignments will have bonus sections if you want to dig deeper.
1011
* __Practicality first.__ Everything essential to solving reinforcement learning problems is worth mentioning. We won't shun away from covering tricks and heuristics. For every major idea there should be a lab that makes you to “feel” it on a practical problem.
1112
* __Git-course.__ Know a way to make the course better? Noticed a typo in a formula? Found a useful link? Made the code more readable? Made a version for alternative framework? You're awesome! [Pull-request](https://help.github.com/articles/about-pull-requests/) it!
1213

14+
[![Github contributors](https://img.shields.io/github/contributors/yandexdataschool/Practical_RL.svg?logo=github&logoColor=white)](https://github.com/yandexdataschool/Practical_RL/graphs/contributors)
15+
1316
# Course info
14-
* Lecture slides are [here](https://yadi.sk/d/loPpY45J3EAYfU).
15-
* Telegram chat room for YSDA & HSE students is [here](https://t.me/rlspring18)
16-
* Grading rules for YSDA & HSE students is [here](https://github.com/yandexdataschool/Practical_RL/wiki/Homeworks-and-grading)
17-
* Online student __[survival guide](https://github.com/yandexdataschool/Practical_RL/wiki/Online-student's-survival-guide)__
18-
* Installing the libraries - [guide and issues thread](https://github.com/yandexdataschool/Practical_RL/issues/1)
19-
* Magical button that launches you into course environment:
20-
* [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/yandexdataschool/Practical_RL/master) - comes with all libraries pre-installed. May be down time to time.
21-
* If it's down, try [__google colab__](https://colab.research.google.com/) or [__azure notebooks__](http://notebooks.azure.com/). Those last longer, but they will require you to run installer commands (see ./Dockerfile).
22-
* Anonymous [feedback form](https://docs.google.com/forms/d/e/1FAIpQLSdurWw97Sm9xCyYwC8g3iB5EibITnoPJW2IkOVQYE_kcXPh6Q/viewform) for everything that didn't go through e-mail.
23-
* [About the course](https://github.com/yandexdataschool/Practical_RL/wiki/Practical-RL)
17+
* __Chat room__ for YSDA & HSE students is [here](https://t.me/joinchat/CDFcMVcoAQvEiI9WAo1pEQ)
18+
* __Grading__ rules for YSDA & HSE students is [here](https://github.com/yandexdataschool/Practical_RL/wiki/Homeworks-and-grading)
19+
20+
* __FAQ:__ [About the course](https://github.com/yandexdataschool/Practical_RL/wiki/Practical-RL), [Technical issues thread](https://github.com/yandexdataschool/Practical_RL/issues/1), [Lecture Slides](https://yadi.sk/d/loPpY45J3EAYfU), [Online Student Survival Guide](https://github.com/yandexdataschool/Practical_RL/wiki/Online-student's-survival-guide)
21+
22+
* Anonymous [feedback form](https://docs.google.com/forms/d/e/1FAIpQLSdurWw97Sm9xCyYwC8g3iB5EibITnoPJW2IkOVQYE_kcXPh6Q/viewform).
23+
24+
* Virtual course environment:
25+
* [Installing dependencies](https://github.com/yandexdataschool/Practical_RL/issues/1) on your local machine (recommended).
26+
* [__google colab__](https://colab.research.google.com/) - set open -> github -> yandexdataschool/pracical_rl -> {branch name} and select any notebook you want.
27+
* Alternatives: [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/yandexdataschool/practical_rl/spring19) and [Azure Notebooks](https://notebooks.azure.com/).
28+
2429

2530
# Additional materials
26-
* A large list of RL materials - [awesome rl](https://github.com/aikorea/awesome-rl)
2731
* [RL reading group](https://github.com/yandexdataschool/Practical_RL/wiki/RL-reading-group)
2832

2933

3034
# Syllabus
3135

3236
The syllabus is approximate: the lectures may occur in a slightly different order and some topics may end up taking two weeks.
3337

34-
* [__week1__](https://github.com/yandexdataschool/Practical_RL/tree/master/week1_intro) RL as blackbox optimization
38+
* [__week01_intro__](./week01_intro) Introduction
3539
* Lecture: RL problems around us. Decision processes. Stochastic optimization, Crossentropy method. Parameter space search vs action space search.
3640
* Seminar: Welcome into openai gym. Tabular CEM for Taxi-v0, deep CEM for box2d environments.
3741
* Homework description - see week1/README.md.
38-
* **YSDA Deadline: 2018.02.26 23.59**
39-
* **HSE Deadline: 2018.01.28 23:59**
40-
41-
* [__week2__](https://github.com/yandexdataschool/Practical_RL/tree/master/week2_value_based) Value-based methods
42+
43+
* [__week02_value_based__](./week02_value_based) Value-based methods
4244
* Lecture: Discounted reward MDP. Value-based approach. Value iteration. Policy iteration. Discounted reward fails.
4345
* Seminar: Value iteration.
4446
* Homework description - see week2/README.md.
45-
* **HSE Deadline: 2018.02.11 23:59**
46-
* **YSDA Deadline: part1 2018.03.05 23.59, part2 2018.03.12 23.59**
4747

48-
49-
* [__week3__](https://github.com/yandexdataschool/Practical_RL/tree/master/week3_model_free) Model-free reinforcement learning
48+
* [__week03_model_free__](./week03_model_free) Model-free reinforcement learning
5049
* Lecture: Q-learning. SARSA. Off-policy Vs on-policy algorithms. N-step algorithms. TD(Lambda).
5150
* Seminar: Qlearning Vs SARSA Vs Expected Value SARSA
5251
* Homework description - see week3/README.md.
53-
* **HSE Deadline: 2018.02.15 23:59**
54-
* **YSDA Deadline: 2018.03.12 23.59**
55-
56-
* [__week4_recap__](https://github.com/yandexdataschool/Practical_RL/tree/master/week4_%5Brecap%5D_deep_learning) - deep learning recap
57-
* Lecture: Deep learning 101
58-
* Seminar: Simple image classification with convnets
59-
60-
* [__week4__](https://github.com/yandexdataschool/Practical_RL/tree/master/week4_approx_rl) Approximate reinforcement learning
61-
* Lecture: Infinite/continuous state space. Value function approximation. Convergence conditions. Multiple agents trick; experience replay, target networks, double/dueling/bootstrap DQN, etc.
62-
* Seminar: Approximate Q-learning with experience replay. (CartPole, Atari)
63-
* **HSE Deadline: 2018.03.04 23:30**
64-
* **YSDA Deadline: 2018.03.20 23.30**
65-
66-
* [__week5__](https://github.com/yandexdataschool/Practical_RL/tree/master/week5_explore) Exploration in reinforcement learning
67-
* Lecture: Contextual bandits. Thompson Sampling, UCB, bayesian UCB. Exploration in model-based RL, MCTS. "Deep" heuristics for exploration.
68-
* Seminar: bayesian exploration for contextual bandits. UCB for MCTS.
69-
70-
* **YSDA Deadline: 2018.03.30 23.30**
71-
72-
* [__week6__](https://github.com/yandexdataschool/Practical_RL/tree/master/week6_policy_based) Policy gradient methods I
73-
* Lecture: Motivation for policy-based, policy gradient, logderivative trick, REINFORCE/crossentropy method, variance reduction(baseline), advantage actor-critic (incl. GAE)
74-
* Seminar: REINFORCE, advantage actor-critic
75-
76-
* [__week7_recap__](https://github.com/yandexdataschool/Practical_RL/tree/master/week7_%5Brecap%5D_rnn) Recurrent neural networks recap
77-
* Lecture: Problems with sequential data. Recurrent neural netowks. Backprop through time. Vanishing & exploding gradients. LSTM, GRU. Gradient clipping
78-
* Seminar: character-level RNN language model
7952

80-
* [__week7__](https://github.com/yandexdataschool/Practical_RL/tree/master/week7_pomdp) Partially observable MDPs
81-
* Lecture: POMDP intro. POMDP learning (agents with memory). POMDP planning (POMCP, etc)
82-
* Seminar: Deep kung-fu & doom with recurrent A3C and DRQN
83-
84-
* [__week8__](https://github.com/yandexdataschool/Practical_RL/tree/master/week8_scst) Applications II
85-
* Lecture: Reinforcement Learning as a general way to optimize non-differentiable loss. G2P, machine translation, conversation models, image captioning, discrete GANs. Self-critical sequence training.
86-
* Seminar: Simple neural machine translation with self-critical sequence training
53+
* __week04__ Approximate (deep) RL
54+
* __week05__ Exploration
55+
* __week06__ Policy Gradient methods
56+
* __week07__ Applications I
57+
* __week{++i}__ Partially Observed MDP
58+
* __week{++i}__ Advanced policy-based methods
59+
* __week{++i}__ Applications II
60+
* __week{++i}__ Distributional reinforcement learning
61+
* __week{++i}__ Inverse RL and Imitation Learning
8762

88-
* [__week9__](https://github.com/yandexdataschool/Practical_RL/tree/master/week9_policy_II) Policy gradient methods II
89-
* Lecture: Trust region policy optimization. NPO/PPO. Deterministic policy gradient. DDPG. Bonus: DPG for discrete action spaces.
90-
* Seminar: Approximate TRPO for simple robotic tasks.
91-
92-
* [Some after-course bonus materials](https://github.com/yandexdataschool/Practical_RL/tree/master/yet_another_week)
93-
9463

9564
# Course staff
9665
Course materials and teaching by: _[unordered]_
9766
- [Pavel Shvechikov](https://github.com/bestxolodec) - lectures, seminars, hw checkups, reading group
98-
- [Oleg Vasilev](https://github.com/Omrigan) - seminars, hw checkups, technical support
99-
- [Alexander Fritsler](https://github.com/Fritz449) - lectures, seminars, hw checkups
10067
- [Nikita Putintsev](https://github.com/qwasser) - seminars, hw checkups, organizing our hot mess
101-
- [Fedor Ratnikov](https://github.com/justheuristic/) - lectures, seminars, hw checkups
102-
- [Alexey Umnov](https://github.com/alexeyum) - seminars, hw checkups
68+
- [Alexander Fritsler](https://github.com/Fritz449) - lectures, seminars, hw checkups
69+
- [Oleg Vasilev](https://github.com/Omrigan) - seminars, hw checkups, technical support
70+
- [Dmitry Nikulin](https://github.com/pastafarianist) - tons of fixes, far and wide
71+
- [Mikhail Konobeev](https://github.com/MichaelKonobeev) - seminars, hw checkups
72+
- [Ivan Kharitonov](https://github.com/neer201) - seminars, hw checkups
73+
- [Ravil Khisamov](https://github.com/zshrav) - seminars, hw checkups
74+
- [Fedor Ratnikov](https://github.com/justheuristic) - admin stuff
10375

10476
# Contributions
10577
* Using pictures from [Berkeley AI course](http://ai.berkeley.edu/home.html)
10678
* Massively refering to [CS294](http://rll.berkeley.edu/deeprlcourse/)
10779
* Several tensorflow assignments by [Scitator](https://github.com/Scitator)
10880
* A lot of fixes from [arogozhnikov](https://github.com/arogozhnikov)
10981
* Other awesome people: see github [contributors](https://github.com/yandexdataschool/Practical_RL/graphs/contributors)
82+
* [Alexey Umnov](https://github.com/alexeyum) helped us a lot during spring2018
11083

docker/Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ RUN pip install --upgrade pip==9.0.3 && \
4141
https://github.com/Lasagne/Lasagne/archive/master.zip \
4242
https://github.com/yandexdataschool/AgentNet/archive/master.zip \
4343
tensorflow \
44-
http://download.pytorch.org/whl/cpu/torch-0.4.1-cp27-cp27mu-linux_x86_64.whl \
44+
https://download.pytorch.org/whl/cpu/torch-1.0.1.post2-cp27-cp27mu-linux_x86_64.whl \
4545
torchvision \
4646
keras
4747

@@ -60,7 +60,7 @@ RUN pip3 install --upgrade pip==9.0.3 && \
6060
pip3 install --upgrade https://github.com/Theano/Theano/archive/master.zip \
6161
https://github.com/Lasagne/Lasagne/archive/master.zip \
6262
https://github.com/yandexdataschool/AgentNet/archive/master.zip \
63-
http://download.pytorch.org/whl/cpu/torch-0.4.1-cp35-cp35m-linux_x86_64.whl \
63+
https://download.pytorch.org/whl/cpu/torch-1.0.1.post2-cp35-cp35m-linux_x86_64.whl \
6464
torchvision \
6565
tensorflow \
6666
keras && \

setup_colab.sh

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
#!/bin/bash
2+
# a setup script for google colab. Will be updated
3+
pip install gym
4+
apt-get install -y xvfb
5+
wget https://raw.githubusercontent.com/yandexdataschool/Practical_DL/fall18/xvfb -O ../xvfb
6+
apt-get install -y python-opengl ffmpeg
7+
pip install pyglet==1.2.4
8+

week1_intro/README.md renamed to week01_intro/README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
## Materials:
2-
* [__Lecture slides__](https://yadi.sk/i/sbc0ZCKx3RRGbW)
2+
* [__Lecture slides__](https://yadi.sk/i/-EUHXUXOTC5t9Q)
33
* __Russian:__
44
* Intro to RL - [video](https://yadi.sk/i/bMo0qa-x3DoqkS)
55
* Blackbox optimization - [video](https://yadi.sk/i/5yf_4oGI3EDJhJ)
@@ -13,6 +13,7 @@
1313

1414
## More materials:
1515
* __[recommended]__ - awesome openai post about evolution strategies - [blog post](https://blog.openai.com/evolution-strategies/), [article](https://arxiv.org/abs/1703.03864)
16+
* __[recommended]__ - formal explanation of crossentropy method in [general](https://people.smp.uq.edu.au/DirkKroese/ps/CEEncycl.pdf) and for [optimization](https://people.smp.uq.edu.au/DirkKroese/ps/CEopt.pdf)
1617
* Deep learning course (if you want to learn in parallel) - https://github.com/yandexdataschool/HSE_deeplearning
1718
* Video on genetic algorithms (english) - [video](https://www.youtube.com/watch?v=ejxfTy4lI6I)
1819
* Another guide to genetic algorithm (english) - [video](https://www.youtube.com/watch?v=zwYV11a__HQ)
@@ -21,9 +22,10 @@
2122
* Longer video on Ant Colony Algorithm (english) - [video](https://www.youtube.com/watch?v=xpyKmjJuqhk)
2223

2324

24-
## Homework description
25+
## Practice assignment
26+
Instant dive in: [__seminar_gym_interface__](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/spring19/week01_intro/seminar_gym_interface.ipynb), [__crossentropy_method__](https://colab.research.google.com/github/yandexdataschool/Practical_RL/blob/spring19/week01_intro/crossentropy_method.ipynb)
27+
2528
* Open `gym_interface.ipynb` and follow instructions from there
26-
* If you haven't installed everything yet, try [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/yandexdataschool/Practical_RL/master)
2729
* After you're done there, proceed to `crossentropy_method.ipynb`
2830
* You can find homework and bonus assignment descriptions at the end of that notebook.
2931
* Note: so far it's enough to say `pip install gym` on top of any data-science-stuffed python, but we'd appreciate if you gradually switch to [full installation](https://github.com/openai/gym#installing-everything).

0 commit comments

Comments
 (0)