Differences

This shows you the differences between two versions of the page.

Link to this comparison view

black_box_optimization [2017/03/23 23:57] (current)
Line 1: Line 1:
  
 +https://​arxiv.org/​pdf/​1703.04529.pdf Task-based End-to-end Model Learning
 +
 +As machine learning techniques have become
 +more ubiquitous, it has become common to see
 +machine learning prediction algorithms operating
 +within some larger process. However, the
 +criteria by which we train machine learning algorithms
 +often differ from the ultimate criteria
 +on which we evaluate them. This paper proposes
 +an end-to-end approach for learning probabilistic
 +machine learning models within the context
 +of stochastic programming,​ in a manner that directly
 +captures the ultimate task-based objective
 +for which they will be used. We then present
 +two experimental evaluations of the proposed approach,
 +one as applied to a generic inventory
 +stock problem and the second to a real-world
 +electrical grid scheduling task. In both cases,
 +we show that the proposed approach can outperform
 +both a traditional modeling approach and a
 +purely black-box policy optimization approach.
 +
 +https://​arxiv.org/​abs/​1611.03824 Learning to Learn for Global Optimization of Black Box Functions
 +
 +We present a learning to learn approach for training recurrent neural networks to perform black-box global optimization. In the meta-learning phase we use a large set of smooth target functions to learn a recurrent neural network (RNN) optimizer, which is either a long-short term memory network or a differentiable neural computer. After learning, the RNN can be applied to learn policies in reinforcement learning, as well as other black-box learning tasks, including continuous correlated bandits and experimental design. We compare this approach to Bayesian optimization,​ with emphasis on the issues of computation speed, horizon length, and exploration-exploitation trade-offs.
 +
 +https://​arxiv.org/​abs/​1606.03152 Policy Networks with Two-Stage Training for Dialogue Systems
 +
 +First, we show that, on summary state and action spaces, deep Reinforcement Learning (RL) outperforms Gaussian Processes methods. ​
 +
 +We show that a deep RL method based on an actor-critic architecture can exploit a small amount of data very efficiently. Indeed, with only a few hundred dialogues collected with a handcrafted policy, the actor-critic deep learner is considerably bootstrapped from a combination of supervised and batch RL.