Differences

This shows you the differences between two versions of the page.

Link to this comparison view

knowledge_gradient [2017/09/14 09:54]
127.0.0.1 external edit
knowledge_gradient [2018/05/30 15:17] (current)
admin
Line 6: Line 6:
  
 This paper introduces Sobolev Training for neural networks, which is a method for incorporating these target derivatives in addition the to target values while training. By optimising neural networks to not only approximate the function'​s outputs but also the function'​s derivatives we encode additional information about the target function within the parameters of the neural network. Thereby we can improve the quality of our predictors, as well as the data-efficiency and generalization capabilities of our learned function approximation. We provide theoretical justifications for such an approach as well as examples of empirical evidence on three distinct domains: regression on classical optimisation datasets, distilling policies of an agent playing Atari, and on large-scale applications of synthetic gradients. In all three domains the use of Sobolev Training, employing target derivatives in addition to target values, results in models with higher accuracy and stronger generalisation. This paper introduces Sobolev Training for neural networks, which is a method for incorporating these target derivatives in addition the to target values while training. By optimising neural networks to not only approximate the function'​s outputs but also the function'​s derivatives we encode additional information about the target function within the parameters of the neural network. Thereby we can improve the quality of our predictors, as well as the data-efficiency and generalization capabilities of our learned function approximation. We provide theoretical justifications for such an approach as well as examples of empirical evidence on three distinct domains: regression on classical optimisation datasets, distilling policies of an agent playing Atari, and on large-scale applications of synthetic gradients. In all three domains the use of Sobolev Training, employing target derivatives in addition to target values, results in models with higher accuracy and stronger generalisation.
 +
 +https://​arxiv.org/​pdf/​1805.09801v1.pdf Meta-Gradient Reinforcement Learning
 +
 +
 +The particular choice of return is one of the
 +chief components determining the nature of the algorithm: the rate at which future
 +rewards are discounted; when and how values should be bootstrapped;​ or even the
 +nature of the rewards themselves. It is well-known that these decisions are crucial
 +to the overall success of RL algorithms. We discuss a gradient-based meta-learning
 +algorithm that is able to adapt the nature of the return, online, whilst interacting
 +and learning from the environment. When applied to 57 games on the Atari 2600
 +environment over 200 million frames, our algorithm achieved a new state-of-the-art
 +performance.
 +