Edit: https://docs.google.com/document/d/1PRJszlQl5grzayW7t5oKWTG9cLZSoLIDh_6V0CLYSiU/edit?usp=sharing

Sequence to Sequence



Train a network to predict, from an input sequence, an output sequence?


How can we build networks that can translate sequences?


This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern.


This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We don’t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.

Known Uses


Related Patterns In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.

Relationship to Canonical Patterns

Relationship to other Patterns

Further Reading

We provide here some additional external material that will help in exploring this pattern in more detail.


To aid in reading, we include sources that are referenced in the text in the pattern.



Order Matters: Sequence to sequence for sets

http://arxiv.org/abs/1409.3215v3 Sequence to Sequence Learning with Neural Networks

Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.

https://arxiv.org/abs/1409.0473 Neural Machine Translation by Jointly Learning to Align and Translate

https://arxiv.org/abs/1604.00788 Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models

https://arxiv.org/abs/1611.00020 Neural Symbolic Machines: Learning Semantic Parsers on Freebase with Weak Supervision

In this work, we propose the Manager-Programmer-Computer framework, which integrates neural networks with non-differentiable memory to support abstract, scalable and precise operations through a friendly neural computer interface. Specifically, we introduce a Neural Symbolic Machine, which contains a sequence-to-sequence neural “programmer”, and a non-differentiable “computer” that is a Lisp interpreter with code assist. To successfully apply REINFORCE for training, we augment it with approximate gold programs found by an iterative maximum likelihood training process. NSM is able to learn a semantic parser from weak supervision over a large knowledge base. It achieves new state-of-the-art performance on WebQuestionsSP, a challenging semantic parsing dataset, with weak supervision.

https://arxiv.org/abs/1605.07912v4 Review Networks for Caption Generation

The review network performs a number of review steps with attention mechanism on the encoder hidden states, and outputs a thought vector after each review step; the thought vectors are used as the input of the attention mechanism in the decoder. We show that conventional encoder-decoders are a special case of our framework.


Our method intializes the encoder and decoder of the seq2seq model with the trained weights of two language models, and then all weights are jointly fine-tuned with labeled data. An additional language modeling loss can be used to regularize the model during fine-tuning. We apply this method to low-resource tasks in machine translation and abstractive summarization and find that it significantly improves the subsequent supervised models

https://arxiv.org/abs/1609.08144 Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

https://arxiv.org/pdf/1703.04474v1.pdf DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks

Our basic module is a new generic unit, the Transition Based Recurrent Unit (TBRU). In addition to hidden layer activations, TBRUs have discrete state dynamics that allow network connections to be built dynamically as a function of intermediate activations. By connecting multiple TBRUs, we can extend and combine commonly used architectures such as sequence-tosequence, attention mechanisms, and recursive tree-structured models. A TBRU can also serve as both an encoder for downstream tasks and as a decoder for its own task simultaneously, resulting in more accurate multi-task learning. We call our approach Dynamic Recurrent Acyclic Graphical Neural Networks, or DRAGNN.

https://github.com/google/seq2seq A general-purpose encoder-decoder framework for Tensorflow that can be used for Machine Translation, Text Summarization, Conversational Modeling, Image Captioning, and more.

https://arxiv.org/pdf/1703.03906v1.pdf Massive Exploration of Neural Machine Translation Architectures

https://arxiv.org/pdf/1704.02312v1.pdf A Constrained Sequence-to-Sequence Neural Model for Sentence Simplification

https://arxiv.org/pdf/1810.01218v1.pdf AlphaSeq: Sequence Discovery with Deep Reinforcement Learning


This allows trellis networks to serve as bridge between recurrent and convolutional architectures, benefitting from algorithmic and architectural techniques developed in either context. We leverage these relationships to design high-performing trellis networks that absorb ideas from both architectural families.