Differences

This shows you the differences between two versions of the page.

Link to this comparison view

ensemble_reinforcement_learning [2017/09/08 19:30] (current)
Line 1: Line 1:
 +https://​arxiv.org/​pdf/​1704.00756.pdf Multi-Advisor Reinforcement Learning
  
 +https://​arxiv.org/​pdf/​1709.00503v1.pdf Mean Actor Critic
 +
 +We propose a new algorithm, Mean Actor-Critic (MAC), for discrete-action
 +continuous-state reinforcement learning. MAC is a policy gradient algorithm that
 +uses the agent’s explicit representation of all action values to estimate the gradient
 +of the policy, rather than using only the actions that were actually executed. This
 +significantly reduces variance in the gradient updates and removes the need for a
 +variance reduction baseline. We show empirical results on two control domains
 +where MAC performs as well as or better than other policy gradient approaches,
 +and on five Atari games, where MAC is competitive with state-of-the-art policy
 +search algorithms.