Edit: https://docs.google.com/document/d/1PRJszlQl5grzayW7t5oKWTG9cLZSoLIDh_6V0CLYSiU/edit?usp=sharing
Sequence to Sequence
Aliases
Intent
Train a network to predict, from an input sequence, an output sequence?
Motivation
How can we build networks that can translate sequences?
Sketch
This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern.
Discussion
This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We don’t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.
Known Uses
https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-with-gpus
Related Patterns In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.
Relationship to Canonical Patterns
Relationship to other Patterns
Further Reading
We provide here some additional external material that will help in exploring this pattern in more detail.
References
To aid in reading, we include sources that are referenced in the text in the pattern.
References
http://arxiv.org/abs/1511.06391v4
Order Matters: Sequence to sequence for sets
http://arxiv.org/abs/1409.3215v3 Sequence to Sequence Learning with Neural Networks
Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
https://arxiv.org/abs/1409.0473 Neural Machine Translation by Jointly Learning to Align and Translate
https://arxiv.org/abs/1604.00788 Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
https://arxiv.org/abs/1611.00020 Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision
In this work, we propose the Manager-Programmer-Computer framework, which integrates neural networks with non-differentiable memory to support abstract, scalable and precise operations through a friendly neural computer interface. Specifically, we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence neural “programmer”, and a non-differentiable “computer” that is a Lisp interpreter with code assist. To successfully apply REINFORCE for training, we augment it with approximate gold programs found by an iterative maximum likelihood training process. NSM is able to learn a semantic parser from weak supervision over a large knowledge base. It achieves new state-of-the-art performance on WebQuestionsSP, a challenging semantic parsing dataset, with weak supervision.
https://arxiv.org/abs/1605.07912v4 Review Networks for Caption Generation
The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework.
https://arxiv.org/pdf/1611.02683v1.pdf UNSUPERVISED PRETRAINING FOR SEQUENCE TO SEQUENCE LEARNING
Our method intializes the encoder and decoder of the seq2seq model with the trained weights of two language models, and then all weights are jointly fine-tuned with labeled data. An additional language modeling loss can be used to regularize the model during fine-tuning. We apply this method to low-resource tasks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models
https://arxiv.org/abs/1609.08144 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
https://arxiv.org/pdf/1703.04474v1.pdf DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks
Our basic module is a new generic unit, the Transition Based Recurrent Unit (TBRU). In addition to hidden layer activations, TBRUs have discrete state dynamics that allow network connections to be built dynamically as a function of intermediate activations. By connecting multiple TBRUs, we can extend and combine commonly used architectures such as sequence-tosequence, attention mechanisms, and recursive tree-structured models. A TBRU can also serve as both an encoder for downstream tasks and as a decoder for its own task simultaneously, resulting in more accurate multi-task learning. We call our approach Dynamic Recurrent Acyclic Graphical Neural Networks, or DRAGNN.
https://github.com/google/seq2seq A general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summarization, Conversational Modeling, Image Captioning, and more.
https://arxiv.org/pdf/1703.03906v1.pdf Massive Exploration of Neural Machine Translation Architectures
https://arxiv.org/pdf/1704.02312v1.pdf A Constrained Sequence-to-Sequence Neural Model for Sentence Simplification
https://arxiv.org/pdf/1810.01218v1.pdf AlphaSeq: Sequence Discovery with Deep Reinforcement Learning
https://github.com/locuslab/trellisnet
This allows trellis networks to serve as bridge between recurrent and convolutional architectures, benefitting from algorithmic and architectural techniques developed in either context. We leverage these relationships to design high-performing trellis networks that absorb ideas from both architectural families.