Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
memory [2018/10/17 11:48]
admin
memory [2018/10/17 12:05]
admin
Line 195: Line 195:
 we introduce a new paradigm for reinforcement learning where agents use recall of specific memories to credit actions from the past, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire computational models in neuroscience,​ psychology, and behavioral economics. we introduce a new paradigm for reinforcement learning where agents use recall of specific memories to credit actions from the past, allowing them to solve problems that are intractable for existing algorithms. This paradigm broadens the scope of problems that can be investigated in AI and offers a mechanistic account of behaviors that may inspire computational models in neuroscience,​ psychology, and behavioral economics.
  
 +Temporal Value Transport is a heuristic algorithm but one that expresses coherent principles we
 +believe will endure: past events are encoded, stored, retrieved, and revaluated. TVT fundamentally
 +intertwines memory systems and reinforcement learning: the attention weights on memories
 +specifically modulate the reward credited to past events.
 +
 +https://​arxiv.org/​pdf/​1810.05017.pdf ONE-SHOT HIGH-FIDELITY IMITATION: TRAINING LARGE-SCALE DEEP NETS WITH RL
 +
 +In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL.