Hierarchical Neural Controller

https://arxiv.org/pdf/1610.05182.pdf Learning and Transfer of Modulated Locomotor Controllers

A high-frequency, low-level “spinal” network with access to proprioceptive sensors learns sensorimotor primitives by training on simple tasks. This pre-trained module is fixed and connected to a low-frequency, high-level “cortical” network, with access to all sensors, which drives behavior by modulating the inputs to the spinal network.

Our design encourages the low-level controller to focus on the specifics of reactive motor control, while a high-level controller directs behavior towards the task goal by communicating a modulatory signal.

We believe that the general idea of reusing learned behavioral primitives is important, and the design principles we have followed represent possible steps towards this goal. Our hierarchical design with information hiding has enabled the construction of low-level motor behaviors that are sheltered from task-specific information, enabling their reuse.


The ability to generalize from past experience to solve previously unseen tasks is a key research challenge in reinforcement learning (RL). In this paper, we consider RL tasks defined as a sequence of high-level instructions and study two types of generalization: to unseen and longer sequences of previously seen instructions, and to sequences where the instructions themselves were previously not seen. We present a novel hierarchical deep RL architecture that consists of two interacting neural controllers: a meta controller that reads instructions and repeatedly communicates subtasks to a subtask controller that in turn learns to perform such subtasks. To generalize better to unseen instructions, we propose a regularizer that encourages to learn subtask embeddings that capture correspondences between similar subtasks. We also propose a new differentiable neural network architecture in the meta controller that learns temporal abstractions which makes learning more stable under delayed reward. Our architecture is evaluated on a non-deterministic 2D grid world where the agent should execute a list of instructions described by natural language. We demonstrate that the proposed architecture is able to generalize well over unseen instructions as well as longer lists of instructions.

http://www.cs.ubc.ca/~van/papers/2017-TOG-deepLoco/index.html DeepLoco: Dynamic Locomotion Skills Using Hierarchical Deep Reinforcement Learning

We adopt a two-level hierarchical control framework. First, low-level controllers are learned that operate at a fine timescale and which achieve robust walking gaits that satisfy stepping-target and style objectives. Second, high-level controllers are then learned which plan at the timescale of steps by invoking desired step targets for the low-level controller. The high-level controller makes decisions directly based on high-dimensional inputs, including terrain maps or other suitable representations of the surroundings. Both levels of the control policy are trained using deep reinforcement learning.

https://arxiv.org/abs/1706.04208 Hybrid Reward Architecture for Reinforcement Learning

This paper contributes towards tackling such challenging domains, by proposing a new method, called Hybrid Reward Architecture (HRA). HRA takes as input a decomposed reward function and learns a separate value function for each component reward function. Because each component typically only depends on a subset of all features, the overall value function is much smoother and can be easier approximated by a low-dimensional representation, enabling more effective learning.

https://arxiv.org/pdf/1709.03480v1.pdf Combining Strategic Learning and Tactical Search in Real-Time Strategy Games

We propose to use a deep convolutional neural network (CNN) to select among a limited set of abstract action choices, and to utilize the remaining computation time for game tree search to improve low level tactics. The CNN is trained by supervised learning on game states labelled by Puppet Search, a strategic search algorithm that uses action abstractions. The network is then used to select a script — an abstract action — to produce low level actions for all units. Subsequently, the game tree search algorithm improves the tactical actions of a subset of units using a limited view of the game state only considering units close to opponent units.