https://www.cs.toronto.edu/~amnih/papers/darn.pdf Deep Autoregressive Networks

https://arxiv.org/abs/1605.02226v3 Neural Autoregressive Distribution Estimation

We present Neural Autoregressive Distribution Estimation (NADE) models, which are neural network architectures applied to the problem of unsupervised distribution and density estimation. They leverage the probability product rule and a weight sharing scheme inspired from restricted Boltzmann machines, to yield an estimator that is both tractable and has good generalization performance. We discuss how they achieve competitive performance in modeling both binary and real-valued observations. We also present how deep NADE models can be trained to be agnostic to the ordering of input dimensions used by the autoregressive product rule decomposition. Finally, we also show how to exploit the topological structure of pixels in images using a deep convolutional architecture for NADE.

http://videolectures.net/deeplearning2015_larochelle_deep_learning/

https://arxiv.org/abs/1502.03509 MADE: Masked Autoencoder for Distribution Estimation

https://arxiv.org/abs/1410.8516 NICE: Non-linear Independent Components Estimation

https://arxiv.org/abs/1611.05013v1 PixelVAE: A Latent Variable Model for Natural Images

https://arxiv.org/abs/1606.04934v1 Improving Variational Inference with Inverse Autoregressive Flow

https://arxiv.org/abs/1606.05328 Conditional Image Generation with PixelCNN Decoders

This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks.

https://arxiv.org/abs/1601.06759 Pixel Recurrent Neural Networks

Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

https://arxiv.org/abs/1610.10099v1 Neural Machine Translation in Linear Time

We present a neural translation model, the ByteNet, and a neural language model, the ByteNet Decoder, that aim at addressing these drawbacks. The ByteNet uses convolutional neural networks with dilation for both the source network and the target network. The ByteNet connects the source and target networks via stacking and unfolds the target network dynamically to generate variable length output sequences. We view the ByteNet as an instance of a wider family of sequence-mapping architectures that stack the sub-networks and use dynamic unfolding. The sub-networks themselves may be convolutional or recurrent.

http://www.scottreed.info/files/iclr2017.pdf GENERATING INTERPRETABLE IMAGES WITH CONTROLLABLE STRUCTURE

We demonstrate improved text-to-image synthesis with controllable object locations using an extension of Pixel Convolutional Neural Networks (PixelCNN). In addition to conditioning on text, we show how the model can generate images conditioned on part keypoints and segmentation masks. The character-level text encoder and image generation network are jointly trained end-to-end via maximum likelihood.

https://arxiv.org/abs/1612.07837 SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

In this paper we propose a novel model for unconditional audio generation based on generating one audio sample at a time. We show that our model, which profits from combining memory-less modules, namely autoregressive multilayer perceptrons, and stateful recurrent neural networks in a hierarchical structure is able to capture underlying sources of variations in the temporal sequences over very long time spans, on three datasets of different nature. Human evaluation on the generated samples indicate that our model is preferred over competing models. We also show how each component of the model contributes to the exhibited performance.

https://arxiv.org/pdf/1612.08185v1.pdf Deep Probabilistic Modeling of Natural Images using a Pyramid Decomposition

We introduce a new technique for probabilistic modeling of natural images that combines the advantages of classic multi-scale and modern deep learning models. By explicitly representing natural images at different scales we derive a model that can capture high level image structure in a computationally efficient way. We show experimentally that our model achieves new stateof-the-art image modeling performance on the CIFAR-10 dataset and at the same time is much faster than competitive models. We also evaluate the proposed technique on a human faces dataset and demonstrate the potential of our model to generate nearly photorealistic face samples.

https://github.com/tensorflow/magenta/blob/master/magenta/reviews/pixelrnn.md

https://arxiv.org/abs/1702.04649 Generative Temporal Models with Memory

We consider the general problem of modeling temporal data with long-range dependencies, wherein new observations are fully or partially predictable based on temporally-distant, past observations. A sufficiently powerful temporal model should separate predictable elements of the sequence from unpredictable elements, express uncertainty about those unpredictable elements, and rapidly identify novel elements that may help to predict the future. To create such models, we introduce Generative Temporal Models augmented with external memory systems. They are developed within the variational inference framework, which provides both a practical training methodology and methods to gain insight into the models' operation. We show, on a range of problems with sparse, long-term temporal dependencies, that these models store information from early in a sequence, and reuse this stored information efficiently. This allows them to perform substantially better than existing models based on well-known recurrent neural networks, like LSTMs.

https://drive.google.com/file/d/0B7jhGCaUwDJeZWZWUXJ4cktxVU0/view Generative Model as Sequence Learning

http://ruotianluo.github.io/2017/01/11/pixelcnn-wavenet/

http://www.slideshare.net/suga93/conditional-image-generation-with-pixelcnn-decoders

http://www.dtic.upf.edu/~mblaauw/MdM_NIPS_seminar/

https://arxiv.org/pdf/1612.08083v1.pdf Language Modeling with Gated Convolutional Networks

https://github.com/PrajitR/fast-pixel-cnn

https://openreview.net/pdf?id=rkdF0ZNKl FAST GENERATION FOR CONVOLUTIONAL AUTOREGRESSIVE MODELS

https://arxiv.org/abs/1702.08575v1 Learning Latent Networks in Vector Auto Regressive Models

We show that the dependencies among the observed processes can be identified successfully under some conditions on the VAR model. Moreover, we can recover the length of all directed paths between any two observed processes which pass through latent part. By utilizing this information, we reconstruct the latent subgraph with minimum number of nodes uniquely if its topology is a directed tree. Furthermore, we propose an algorithm that finds all possible minimal latent networks if there exists at most one directed path of each length between any two observed nodes through the latent part.

https://arxiv.org/abs/1703.03664v1 Parallel Multiscale Autoregressive Density Estimation

PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain pixel groups as conditionally independent. Our new PixelCNN model achieves competitive density estimation and orders of magnitude speedup - O(log N) sampling instead of O(N) - enabling the practical generation of 512×512 images. We evaluate the model on class-conditional image generation, text-to-image synthesis, and action-conditional video generation, showing that our model achieves the best results among non-pixel-autoregressive density models that allow efficient sampling.

https://arxiv.org/pdf/1703.07684.pdf Predicting Deeper into the Future of Semantic Segmentation

We develop an autoregressive convolutional neural network that learns to iteratively generate multiple frames.

We explored five different models for this task relying on RGB and/or segmentations from previous frames. For prediction beyond a single future frame, we considered batch models that predict all future frames at once, and autoregressive models that sequentially predict the future frames. We found that autoregressive training produces the best results for our problem, and that models predicting in the segmentation space work better than those relying on the RGB frames.

https://www.cs.toronto.edu/~amnih/papers/darn.pdf Deep AutoRegressive Networks

We introduce a deep, generative autoencoder capable of learning hierarchies of distributed representations from data. Successive deep stochastic hidden layers are equipped with autoregressive connections, which enable the model to be sampled from quickly and exactly via ancestral sampling. We derive an efficient approximate parameter estimation method based on the minimum description length (MDL) principle, which can be seen as maximising a variational lower bound on the log-likelihood, with a feedforward neural network implementing approximate inference. We demonstrate state-of-the-art generative performance on a number of classic data sets: several UCI data sets, MNIST and Atari 2600 games.

https://arxiv.org/pdf/1703.06846.pdf Boosting Dilated Convolutional Networks with Mixed Tensor Decompositions

https://arxiv.org/abs/1706.00531v1 PixelGAN Autoencoders

By imposing a Gaussian prior, we were able to disentangle the low-frequency and high-frequency statistics of the images, and by imposing a categorical prior we were able to disentangle the style and content of images and learn representations that are specifically useful for clustering and semi-supervised learning tasks.

https://arxiv.org/abs/1703.03664 Parallel Multiscale Autoregressive Density Estimation

PixelCNN achieves state-of-the-art results in density estimation for natural images. Although training is fast, inference is costly, requiring one network evaluation per pixel; O(N) for N pixels. This can be sped up by caching activations, but still involves generating each pixel sequentially. In this work, we propose a parallelized PixelCNN that allows more efficient inference by modeling certain pixel groups as conditionally independent. Our new PixelCNN model achieves competitive density estimation and orders of magnitude speedup - O(log N) sampling instead of O(N) - enabling the practical generation of 512×512 images. We evaluate the model on class-conditional image generation, text-to-image synthesis, and action-conditional video generation, showing that our model achieves the best results among non-pixel-autoregressive density models that allow efficient sampling.

https://arxiv.org/pdf/1710.10304v1.pdf FEW-SHOT AUTOREGRESSIVE DENSITY ESTIMATION: TOWARDS LEARNING TO LEARN DISTRIBUTIONS

In this paper, we show how 1) neural attention and 2) meta learning techniques can be used in combination with autoregressive models to enable effective few-shot density estimation. Our proposed modifications to PixelCNN result in state-of-the art few-shot density estimation on the Omniglot dataset.

https://arxiv.org/pdf/1711.02741.pdf Recurrent Autoregressive Networks for Online Multi-Object Tracking

https://deepmind.com/blog/high-fidelity-speech-synthesis-wavenet/

https://arxiv.org/abs/1712.09763 PixelSNAIL: An Improved Autoregressive Generative Model

https://arxiv.org/abs/1712.01897 Online Learning with Gated Linear Networks Rather than relying on non-linear transfer functions, our method gains representational power by the use of data conditioning. We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borel-measurable function on a compact subset of euclidean space; the result is stronger than many universality results for connectionist architectures because we provide both the model and the learning procedure for which convergence is guaranteed.

https://arxiv.org/abs/1801.09819v2 Transformation Autoregressive Networks The fundamental task of general density estimation has been of keen interest to machine learning. Recent advances in density estimation have either: a) proposed a flexible model to estimate the conditional factors of the chain rule, p(xi|xi−1,…); or b) used flexible, non-linear transformations of variables of a simple base distribution. Instead, this work jointly leverages transformations of variables and autoregressive conditional models, and proposes novel methods for both. We provide a deeper understanding of our methods, showing a considerable improvement through a comprehensive study over both real world and synthetic data. Moreover, we illustrate the use of our models in outlier detection and image modeling tasks.

In conclusion, this work jointly leverages transformations of variables and autoregressive models, and proposes novel methods for both. We show a considerable improvement with our methods through a comprehensive study over both real world and synthetic data. Also, we illustrate the utility of our models in outlier detection and digit modeling tasks.

https://arxiv.org/abs/1710.10304v4 Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions

this paper, we show how 1) neural attention and 2) meta learning techniques can be used in combination with autoregressive models to enable effective few-shot density estimation.

https://arxiv.org/abs/1804.00779 Neural Autoregressive Flows

Normalizing flows and autoregressive models have been successfully combined to produce state-of-the-art results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate state-of-the-art WaveNet-based speech synthesis to 20x faster than real-time, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields state-of-the-art performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST. https://github.com/CW-Huang/NAF

https://arxiv.org/abs/1806.05575 Autoregressive Quantile Networks for Generative Modeling

We introduce autoregressive implicit quantile networks (AIQN), a fundamentally different approach to generative modeling than those commonly used, that implicitly captures the distribution using quantile regression. AIQN is able to achieve superior perceptual quality and improvements in evaluation metrics, without incurring a loss of sample diversity.

https://arxiv.org/pdf/1802.06901.pdf Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement

https://arxiv.org/abs/1811.00002v1 WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need for auto-regression. WaveGlow is implemented using only a single network, trained using only a single cost function: maximizing the likelihood of the training data, which makes the training procedure simple and stable. Our PyTorch implementation produces audio samples at a rate of more than 500 kHz on an NVIDIA V100 GPU. Mean Opinion Scores show that it delivers audio quality as good as the best publicly available WaveNet implementation. All code will be made publicly available online. https://nv-adlr.github.io/WaveGlow

https://github.com/ikostrikov/pytorch-flows A PyTorch implementations of Masked Autoregressive Flow and some other invertible transformations from Glow: Generative Flow with Invertible 1×1 Convolutions and Density estimation using Real NVP.