DeepContext: Context-Encoding Neural Pathways for 3D Holistic Scene Understanding

In particular, 3D context has been shown to be an extremely important cue for scene understanding - yet very little research has been done on integrating context information with deep models. This paper presents an approach to embed 3D context into the topology of a neural network trained to perform holistic scene understanding. Given a depth image depicting a 3D scene, our network aligns the observed scene with a predefined 3D scene template, and then reasons about the existence and location of each object within the scene template. In doing so, our model recognizes multiple objects in a single forward pass of a 3D convolutional neural network, capturing both global scene and local object information simultaneously. To create training data for this 3D network, we generate partly hallucinated depth images which are rendered by replacing real objects with a repository of CAD models of the same object category. Extensive experiments demonstrate the effectiveness of our algorithm compared to the state-ofthe-arts. Source code and data will be available.

This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts. We demonstrate that the feature representation learned using this within-image context indeed captures visual similarity across images. For example, this representation allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Furthermore, we show that the learned ConvNet can be used in the R-CNN framework and provides a significant boost over a randomly-initialized ConvNet, resulting in state-of-the-art performance among algorithms which use only Pascal-provided training set annotations. TAC-GAN – Text Conditioned Auxiliary Classifier Generative Adversarial Network

In this paper, we present the Text Conditioned Auxiliary Classifier Generative Adversarial Network (TAC-GAN), which builds upon the AC-GAN by conditioning the generated images on a text description instead of on a class label. Multilevel Context Representation for Improving Object Recognition

This paper postulates that the use of context closer to the high-level layers provides the scale and translation invariance and works better than using the top layer only.

Also, it is shown that at almost no additional cost, the relative error rates of the original networks decrease by up to 2%. This fact makes the extended networks a very well suited choice for usage in production environments. The quantitative evaluation signifies that the new approach could be, at inference time, 144 times more efficient than the current approaches while maintaining comparable performance.

Unlike most CNNs, including AlexNet and GoogLeNet, the proposed networks feed the classification part of the network with information not only from the highest-level convolutional layer, but with information from the two highest-level convolutional layers. We call the enhanced versions of these networks AlexNet++ and GoogLeNet++. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

We propose an algorithm for meta-learning that is model-agnostic, in the sense that it is compatible with any model trained with gradient descent and applicable to a variety of different learning problems, including classification, regression, and reinforcement learning. The goal of meta-learning is to train a model on a variety of learning tasks, such that it can solve new learning tasks using only a small number of training samples. In our approach, the parameters of the model are explicitly trained such that a small number of gradient steps with a small amount of training data from a new task will produce good generalization performance on that task. In effect, our method trains the model to be easy to fine-tune. We demonstrate that this approach leads to state-of-the-art performance on a few-shot image classification benchmark, produces good results on few-shot regression, and accelerates fine-tuning for policy gradient reinforcement learning with neural network policies. Improving Context Aware Language Models

We show that the most widely-used approach to adaptation (concatenating the context with the word embedding at the input to the recurrent layer) is outperformed by a model that has some low-cost improvements: adaptation of both the hidden and output layers. and a feature hashing bias term to capture context idiosyncrasies. Experiments on language modeling and classification tasks using three different corpora demonstrate the advantages of the proposed techniques. Topically Driven Neural Language Model

Language models are typically applied at the sentence level, without access to the broader document context. We present a neural language model that incorporates document context in the form of a topic model-like architecture, thus providing a succinct representation of the broader document context outside of the current sentence. Experiments over a range of datasets demonstrate that our model outperforms a pure sentence-based model in terms of language model perplexity, and leads to topics that are potentially more coherent than those produced by a standard LDA topic model. Our model also has the ability to generate related sentences for a topic, providing another way to interpret topics. Words, Concepts, and the Geometry of Analogy Contextual Explanation Networks

Our approach offers two major advantages: (i) for each prediction, valid instance-specific explanations are generated with no computational overhead and (ii) prediction via explanation acts as a regularization and boosts performance in low-resource settings.

In this paper, we have introduced contextual explanation networks (CENs)—models that learn to predict by constructing and applying simple context-specific hypotheses. We have formally defined CENs as a class of probabilistic models, considered a number of special cases (e.g., the mixture of experts), and derived learning and inference procedures within the encoder-decoder framework for simple and sequentially-structured outputs. Learning to predict and to explain jointly turned out to have a number of benefits, including strong regularization, consistency, and the ability to generate explanations with no computational overhead. Context encoders as a simple but powerful extension of word2vec

However, as only a single embedding is learned for every word in the vocabulary, the model fails to optimally represent words with multiple meanings. Additionally, it is not possible to create embeddings for new (out-of-vocabulary) words on the spot. Based on an intuitive interpretation of the continuous bag-of-words (CBOW) word2vec model's negative sampling training objective in terms of predicting context based similarities, we motivate an extension of the model we call context encoders (ConEc). By multiplying the matrix of trained word2vec embeddings with a word's average context vector, out-of-vocabulary (OOV) embeddings and representations for a word with multiple meanings can be created based on the word's local contexts. The benefits of this approach are illustrated by using these word embeddings as features in the CoNLL 2003 named entity recognition (NER) task. Reading Twice for Natural Language Understanding

This work approaches this problem by incorporating contextual information into word representations prior to processing the task at hand. Context-aware Captions from Context-agnostic Supervision

We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of “siamese cat” and “tiger cat”, we generate language that describes the “siamese cat” in a way that distinguishes it from “tiger cat”. Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image captioning to generate language that uniquely refers to one of two semantically-similar images in the COCO dataset. Evaluations with discriminative ground truth for justification and human studies for discriminative image captioning reveal that our approach outperforms baseline generative and speaker-listener approaches for discrimination. A Spacetime Approach to Generalized Cognitive Reasoning in Multi-scale Learning

In this report, a quasi-linguistic approach to knowledge representation is discussed, motivated by spacetime structure. Tokenized patterns from diverse sources are integrated to build a lightly constrained and approximately scale-free network. This is then be parsed with very simple recursive algorithms to generate `brainstorming' sets of reasoned knowledge. Learned in translation: contextualized word vectors

Our work proposes to use networks that have already learned how to contextualize words to give new neural networks an advantage in learning to understand other parts of natural language. Context-aware Single-Shot Detector

The experimental results show that the multi-scale context modeling significantly improves the detection accuracy. Sequential Attention: A Context-Aware Alignment Function for Machine Reading

In this paper we propose a neural network model with a novel Sequential Attention layer that extends soft attention by assigning weights to words in an input sequence in a way that takes into account not just how well that word matches a query, but how well surrounding words match. We evaluate this approach on the task of reading comprehension (on the Who did What and CNN datasets) and show that it dramatically improves a strong baseline–the Stanford Reader–and is competitive with the state of the art. Personalizing Session-based Recommendations with Hierarchical Recurrent Neural Networks

We propose a seamless way to personalize RNN models with cross-session information transfer and devise a Hierarchical RNN model that relays end evolves latent hidden states of the RNNs across user sessions. Results on two industry datasets show large improvements over the session-only RNNs. Temporal Grounding Graphs for Language Understanding with Accrued Visual-Linguistic Context Dynamic Entity Representations in Neural Language Models

Understanding a long document requires tracking how entities are introduced and evolve over time. We present a new type of language model, ENTITYNLM, that can explicitly model entities, dynamically update their representations, and contextually generate their mentions. Our model is generative and flexible; it can model an arbitrary number of entities in context while generating each entity mention at an arbitrary length. In addition, it can be used for several different tasks such as language modeling, coreference resolution, and entity prediction. Experimental results with all these tasks demonstrate that our model consistently outperforms strong baselines and prior work. The Consciousness Prior An Artificial Neural Network Architecture Based on Context Transformations in Cortical Minicolumns

The Two-stream hypothesis is a well-known and accepted model of visual and auditory perception [4,6]. The ventral is called “what pathway”, the dorsal stream “where pathway”. Minicolumns in the first one are selective to object types, their shapes and colors. The second stream has minicolumns with position and orientation selectivity. Memory-based Parameter Adaptation

Our method, Memory-based Parameter Adaptation, stores examples in memory and then uses a context-based lookup to directly modify the weights of a neural network. Much higher learning rates can be used for this local adaptation, reneging the need for many iterations over similar data before good predictions can be made. As our method is memory-based, it alleviates several shortcomings of neural networks, such as catastrophic forgetting, fast, stable acquisition of new knowledge, learning with an imbalanced class labels, and fast learning during evaluation. We demonstrate this on a range of supervised tasks: large-scale image classification and language modelling. Deep contextualized word representations

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are learned functions of the internal states of a deep bidirectional language model (biLM), which is pre-trained on a large text corpus. We show that these representations can be easily added to existing models and significantly improve the state of the art across six challenging NLP problems, including question answering, textual entailment and sentiment analysis. We also present an analysis showing that exposing the deep internals of the pre-trained network is crucial, allowing downstream models to mix different types of semi-supervision signals. Contrasting information theoretic decompositions of modulatory and arithmetic interactions in neural information processing systems

The decompositions that we report here show that contextual modulation has information processing properties that contrast with those of all four simple arithmetic operators, that it can take various forms, and that the form used in our previous studies of artificial nets composed of local processors with both driving and contextual inputs is particularly well-suited to provide the distinctive capabilities of contextual modulation under a wide range of conditions. We argue that the decompositions reported here could be compared with those obtained from empirical neurobiological and psychophysical data under conditions thought to reflect contextual modulation. That would then shed new light on the underlying processes involved. Finally, we suggest that such decompositions could aid the design of context-sensitive machine learning algorithms. Image Generation from Scene Graphs

To truly understand the visual world our models should be able not only to recognize images but also generate them. To this end, there has been exciting recent progress on generating images from natural language descriptions. These methods give stunning results on limited domains such as descriptions of birds or flowers, but struggle to faithfully reproduce complex sentences with many objects and relationships. To overcome this limitation we propose a method for generating images from scene graphs, enabling explicitly reasoning about objects and their relationships. Our model uses graph convolution to process input graphs, computes a scene layout by predicting bounding boxes and segmentation masks for objects, and converts the layout to an image with a cascaded refinement network. The network is trained adversarially against a pair of discriminators to ensure realistic outputs. We validate our approach on Visual Genome and COCO-Stuff, where qualitative results, ablations, and user studies demonstrate our method's ability to generate complex images with multiple objects. Context is Everything: Finding Meaning Statistically in Semantic Spaces

This paper introduces Contextual Salience (CoSal), a simple and explicit measure of a word's importance in context which is a more theoretically natural, practically simpler, and more accurate replacement to tf-idf. CoSal supports very small contexts (20 or more sentences), out-of context words, and is easy to calculate. A word vector space generated with both bigram phrases and unigram tokens reveals that contextually significant words disproportionately define phrases. This relationship is applied to produce simple weighted bag-of-words sentence embeddings. This model outperforms SkipThought and the best models trained on unordered sentences in most tests in Facebook's SentEval, beats tf-idf on all available tests, and is generally comparable to the state of the art. This paper also applies CoSal to sentence and document summarization and an improved and context-aware cosine distance. Applying the premise that unexpected words are important, CoSal is presented as a replacement for tf-idf and an intuitive measure of contextual word importance. Context-Attentive Embeddings for Improved Sentence Representations

While one of the first steps in many NLP systems is selecting what embeddings to use, we argue that such a step is better left for neural networks to figure out by themselves. To that end, we introduce a novel, straightforward yet highly effective method for combining multiple types of word embeddings in a single model, leading to state-of-the-art performance within the same model class on a variety of tasks. We subsequently show how the technique can be used to shed new insight into the usage of word embeddings in NLP systems. Rapid Adaptation with Conditionally Shifted Neurons

We describe a mechanism by which artificial neural networks can learn rapid adaptation - the ability to adapt on the fly, with little data, to new tasks - that we call conditionally shifted neurons. We apply this mechanism in the framework of metalearning, where the aim is to replicate some of the flexibility of human learning in machines. Conditionally shifted neurons modify their activation values with task-specific shifts retrieved from a memory module, which is populated rapidly based on limited task experience. On metalearning benchmarks from the vision and language domains, models augmented with conditionally shifted neurons achieve state-of-the-art results. Language Modeling with Gated Convolutional Networks Neural Motifs: Scene Graph Parsing with Global Context

Our analysis motivates a new baseline: given object detections, predict the most frequent relation between object pairs with the given labels, as seen in the training set. This baseline improves on the previous state-of-the-art by an average of 3.6% relative improvement across evaluation settings. We then introduce Stacked Motif Networks, a new architecture designed to capture higher order motifs in scene graphs that further improves over our strong baseline by an average 7.1% relative gain. Our code is available at Contextual Parameter Generation for Universal Neural Machine Translation Dual Ask-Answer Network for Machine Reading Comprehension CAML: Fast Context Adaptation via Meta-Learning CAML: Fast Context Adaptation via Meta-Learning

CAML: Fast Context Adaptation via Meta-Learning Luisa M Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson (Submitted on 8 Oct 2018 (this version), latest version 12 Oct 2018 (v2)) We propose CAML, a meta-learning method for fast adaptation that partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks. Contextual Topic Modeling For Dialog Systems

Our work for detecting conversation topics and keywords can be used to guide chatbots towards coherent dialog. Reference-point centering and range-adaptation enhance human reinforcement learning at the cost of irrational preferences Context Aware Machine Learning

The embedding of an observation can also be decomposed into a weighted sum of two vectors, representing its context-free and context-sensitive parts

new architecture for modeling attention in deep neural networks. More surprisingly, our new principle provides a novel understanding of the gates and equations defined by the long short term memory model, which also leads to a new model that is able to converge significantly faster and achieve much lower prediction errors. Furthermore, our principle also inspires a new type of generic neural network layer that better resembles real biological neurons than the traditional linear mapping plus nonlinear activation based architecture. Its multi-layer extension provides a new principle for deep neural networks which subsumes residual network (ResNet) as its special case, and its extension to convolutional neutral network model accounts for irrelevant input (e.g., background in an image) in addition to filtering.