Skip to content

yangli2/rl_to_learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

rl_to_learn

Learning is a reinforcement-learning problem.

  1. We measure our states: current weights, current gradients, maybe histories, from sampled minibatches.
  2. We evaluate our policy based on the states: the policy network. We get \Delta w for the value network.
  3. We sample the reward (i.e. loss) while using w + \Delta w as the weights of the value network.
  4. We do gradient ascent on the policy network weights.
  5. We apply the \Delta w to the value network weights.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors