Curiosity

https://arxiv.org/abs/1611.04717v1 #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

https://arxiv.org/abs/1606.01868v2 Unifying Count-Based Exploration and Intrinsic Motivation https://deepmind.com/blog/deepmind-papers-nips-part-3/

VIME Houthooft

https://www.technologyreview.com/s/603366/mathematical-model-reveals-the-patterns-of-how-innovations-arise/

https://arxiv.org/abs/1611.09321v2 Improving Policy Gradient by Exploring Under-appreciated Rewards

This paper presents a novel form of policy gradient for model-free reinforcement learning (RL) with improved exploration properties. Current policy-based methods use entropy regularization to encourage undirected exploration of the reward landscape, which is ineffective in high dimensional spaces with sparse rewards. We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions. An action sequence is considered under-appreciated if its log-probability under the current policy under-estimates its resulting reward. The proposed exploration strategy is easy to implement, requiring small modifications to an implementation of the REINFORCE algorithm. We evaluate the approach on a set of algorithmic tasks that have long challenged RL methods. Our approach reduces hyper-parameter sensitivity and demonstrates significant improvements over baseline methods. Our algorithm successfully solves a benchmark multi-digit addition task and generalizes to long sequences. This is, to our knowledge, the first time that a pure RL method has solved addition using only reward feedback.

https://arxiv.org/pdf/1612.02605.pdf TOWARDS INFORMATION-SEEKING AGENTS

https://arxiv.org/abs/1705.05363v1 Curiosity-driven Exploration by Self-supervised Prediction

https://arxiv.org/abs/1706.10295 Noisy Networks for Exploration

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and ϵ-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

http://antoniosliapis.com/papers/coupling_novelty_and_surprise_for_evolutionary_divergence.pdf Coupling Novelty and Surprise for Evolutionary Divergence

As novelty and surprise search have already shown much promise individually, the hypothesis is that an evolutionary process that rewards both novel and surprising solutions will be able to handle deception in a beŠer fashion and lead to more successful solutions faster. In this paper we introduce an algorithm that realises both novelty and surprise search and we compare it against the two algorithms that compose it in a number of robot navigation tasks.

https://arxiv.org/pdf/1706.10295v1.pdf Noisy Networks for Exploration

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and -greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.

http://bica2017.bicasociety.org/wp-content/uploads/2017/08/BICA_2017_paper_89.pdf A Robust Cognitive Architecture for Learning from Surprises

https://arxiv.org/abs/1710.11089 Eigenoption Discovery through the Deep Successor Representation

http://www.marcgbellemare.info/static/publications/ostrovski17countbased.pdf Count-Based Exploration with Neural Density Models

https://arxiv.org/abs/1705.05363 Curiosity-driven Exploration by Self-supervised Prediction

We formulate curiosity as the error in an agent's ability to predict the consequence of its own actions in a visual feature space learned by a self-supervised inverse dynamics model. Our formulation scales to high-dimensional continuous state spaces like images, bypasses the difficulties of directly predicting pixels, and, critically, ignores the aspects of the environment that cannot affect the agent. https://github.com/pathak22/noreward-rl

https://arxiv.org/pdf/1802.10546.pdf Computational Theories of Curiosity-Driven Learning

https://arxiv.org/abs/1806.06505v1 A unified strategy for implementing curiosity and empowerment driven reinforcement learning

https://arxiv.org/abs/1808.05492v1 Metric Learning for Novelty and Anomaly Detection

We show that metric learning provides a better output embedding space to detect data outside the learned distribution than cross-entropy softmax based models. This opens an opportunity to further research on how this embedding space should be learned, with restrictions that could further improve the field. The presented results suggest that out-of-distribution data might not all be seen as a single type of anomaly, but instead a continuous representation between novelty and anomaly data. In that spectrum, anomaly detection is the easier task, giving more focus at the difficulty of novelty detection.

https://openreview.net/forum?id=SkeK3s0qKQ EPISODIC CURIOSITY THROUGH REACHABILITY

y. One solution to this problem is to allow the agent to create rewards for itself — thus making rewards dense and more suitable for learning. In particular, inspired by curious behaviour in animals, observing something novel could be rewarded with a bonus. Such bonus is summed up with the real task reward — making it possible for RL algorithms to learn from the combined reward. We propose a new curiosity method which uses episodic memory to form the novelty bonus

https://arxiv.org/abs/1810.06284 CURIOUS: Intrinsically Motivated Multi-Task, Multi-Goal Reinforcement Learning

This paper proposes CURIOUS, an extension of Universal Value Function Approximators that enables intrinsically motivated agents to learn to achieve both multiple tasks and multiple goals within a unique policy, leveraging hindsight learning. Agents focus on achievable tasks first, using an automated curriculum learning mechanism that biases their attention towards tasks maximizing the absolute learning progress. This mechanism provides robustness to catastrophic forgetting (by refocusing on tasks where performance decreases) and distracting tasks (by avoiding tasks with no absolute learning progress). Furthermore, we show that having two levels of parameterization (tasks and goals within tasks) enables more efficient learning of skills in an environment with a modular physical structure (e.g. multiple objects) as compared to flat, goal-parameterized RL with hindsight experience replay.

https://arxiv.org/abs/1810.12162 Model-Based Active Exploration

We introduce Model-Based Active eXploration (MAX), an algorithm that actively explores the environment. It minimizes data required to comprehensively model the environment by planning to observe novel events, instead of merely reacting to novelty encountered by chance. Non-stationarity induced by traditional exploration bonus techniques is avoided by constructing fresh exploration policies only at time of action. In semi-random toy environments where directed exploration is critical to make progress, our algorithm is at least an order of magnitude more efficient than strong baselines.