This is an old revision of the document! DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding

In particular, 3D context has been shown to be an extremely important cue for scene understanding - yet very little research has been done on integrating context information with deep models. This paper presents an approach to embed 3D context into the topology of a neural network trained to perform holistic scene understanding. Given a depth image depicting a 3D scene, our network aligns the observed scene with a predefined 3D scene template, and then reasons about the existence and location of each object within the scene template. In doing so, our model recognizes multiple objects in a single forward pass of a 3D convolutional neural network, capturing both global scene and local object information simultaneously. To create training data for this 3D network, we generate partly hallucinated depth images which are rendered by replacing real objects with a repository of CAD models of the same object category. Extensive experiments demonstrate the effectiveness of our algorithm compared to the state-ofthe-arts. Source code and data will be available.

This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts. We demonstrate that the feature representation learned using this within-image context indeed captures visual similarity across images. For example, this representation allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Furthermore, we show that the learned ConvNet can be used in the R-CNN framework and provides a significant boost over a randomly-initialized ConvNet, resulting in state-of-the-art performance among algorithms which use only Pascal-provided training set annotations. TAC-GAN – Text Conditioned Auxiliary Classifier Generative Adversarial Network

In this paper, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network (TAC-GAN), which builds upon the AC-GAN by conditioning the generated images on a text description instead of on a class label. Multilevel Context Representation for Improving Object Recognition

This paper postulates that the use of context closer to the high-level layers provides the scale and translation invariance and works better than using the top layer only.

Also, it is shown that at almost no additional cost, the relative error rates of the original networks decrease by up to 2%. This fact makes the extended networks a very well suited choice for usage in production environments. The quantitative evaluation signifies that the new approach could be, at inference time, 144 times more efficient than the current approaches while maintaining comparable performance.

Unlike most CNNs, including AlexNet and GoogLeNet, the proposed networks feed the classification part of the network with information not only from the highest-level convolutional layer, but with information from the two highest-level convolutional layers. We call the enhanced versions of these networks AlexNet++ and GoogLeNet++. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on a few-shot image classification benchmark, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies. Improving Context Aware Language Models

We show that the most widely-used approach to adaptation (concatenating the context with the word embedding at the input to the recurrent layer) is outperformed by a model that has some low-cost improvements: adaptation of both the hidden and output layers. and a feature hashing bias term to capture context idiosyncrasies. Experiments on language modeling and classification tasks using three different corpora demonstrate the advantages of the proposed techniques. Topically Driven Neural Language Model

Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics. Words, Concepts, and the Geometry of Analogy Contextual Explanation Networks

Our approach offers two major advantages: (i) for each prediction, valid instance-specific explanations are generated with no computational overhead and (ii) prediction via explanation acts as a regularization and boosts performance in low-resource settings.

In this paper, we have introduced contextual explanation networks (CENs)—models that learn to predict by constructing and applying simple context-specific hypotheses. We have formally defined CENs as a class of probabilistic models, considered a number of special cases (e.g., the mixture of experts), and derived learning and inference procedures within the encoder-decoder framework for simple and sequentially-structured outputs. Learning to predict and to explain jointly turned out to have a number of benefits, including strong regularization, consistency, and the ability to generate explanations with no computational overhead. Context encoders as a simple but powerful extension of word2vec

However, as only a single embedding is learned for every word in the vocabulary, the model fails to optimally represent words with multiple meanings. Additionally, it is not possible to create embeddings for new (out-of-vocabulary) words on the spot. Based on an intuitive interpretation of the continuous bag-of-words (CBOW) word2vec model's negative sampling training objective in terms of predicting context based similarities, we motivate an extension of the model we call context encoders (ConEc). By multiplying the matrix of trained word2vec embeddings with a word's average context vector, out-of-vocabulary (OOV) embeddings and representations for a word with multiple meanings can be created based on the word's local contexts. The benefits of this approach are illustrated by using these word embeddings as features in the CoNLL 2003 named entity recognition (NER) task. Reading Twice for Natural Language Understanding

This work approaches this problem by incorporating contextual information into word representations prior to processing the task at hand. Context-aware Captions from Context-agnostic Supervision

We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of “siamese cat” and “tiger cat”, we generate language that describes the “siamese cat” in a way that distinguishes it from “tiger cat”. Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image captioning to generate language that uniquely refers to one of two semantically-similar images in the COCO dataset. Evaluations with discriminative ground truth for justification and human studies for discriminative image captioning reveal that our approach outperforms baseline generative and speaker-listener approaches for discrimination. A Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale Learning

In this report, a quasi-linguistic approach to knowledge representation is discussed, motivated by spacetime structure. Tokenized patterns from diverse sources are integrated to build a lightly constrained and approximately scale-free network. This is then be parsed with very simple recursive algorithms to generate `brainstorming' sets of reasoned knowledge. Learned in translation: contextualized word vectors

Our work proposes to use networks that have already learned how to contextualize words to give new neural networks an advantage in learning to understand other parts of natural language. Context-aware Single-Shot Detector

The experimental results show that the multi-scale context modeling significantly improves the detection accuracy. Sequential Attention: A Context-Aware Alignment Function for Machine Reading

In this paper we propose a neural network model with a novel Sequential Attention layer that extends soft attention by assigning weights to words in an input sequence in a way that takes into account not just how well that word matches a query, but how well surrounding words match. We evaluate this approach on the task of reading comprehension (on the Who did What and CNN datasets) and show that it dramatically improves a strong baseline–the Stanford Reader–and is competitive with the state of the art. Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks

We propose a seamless way to personalize RNN models with cross-session information transfer and devise a Hierarchical RNN model that relays end evolves latent hidden states of the RNNs across user sessions. Results on two industry datasets show large improvements over the session-only RNNs. Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context Dynamic Entity Representations in Neural Language Models

Understanding a long document requires tracking how entities are introduced and evolve over time. We present a new type of language model, ENTITYNLM, that can explicitly model entities, dynamically update their representations, and contextually generate their mentions. Our model is generative and flexible; it can model an arbitrary number of entities in context while generating each entity mention at an arbitrary length. In addition, it can be used for several different tasks such as language modeling, coreference resolution, and entity prediction. Experimental results with all these tasks demonstrate that our model consistently outperforms strong baselines and prior work. The Consciousness Prior