This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revision Both sides next revision
cognitive_synergy [2017/11/23 13:29]
cognitive_synergy [2018/05/15 09:23]
Line 148: Line 148:
 https://​arxiv.org/​pdf/​1711.07611.pdf Event Representations with Tensor-based Compositions https://​arxiv.org/​pdf/​1711.07611.pdf Event Representations with Tensor-based Compositions
 +https://​arxiv.org/​pdf/​1711.10402v1.pdf An Adversarial Neuro-Tensorial Approach For Learning Disentangled
 +In this paper, we propose the first unsupervised deep
 +learning method for disentangling multiple latent factors
 +of variation in face images captured in-the-wild. To this
 +end, we propose a deep latent variable model, where the
 +multiplicative interactions of multiple latent factors of variation
 +are explicitly modelled by means of multilinear (tensor)
 +structure. We demonstrate that the proposed approach
 +indeed learns disentangled representations of facial expressions
 +and pose, which can be used in various applications,​
 +including face editing, as well as 3D face reconstruction
 +and classification of facial expression, identity and pose.
 +We demonstrate the power of our methodology in expression
 +and pose transfer, as well as discovering powerful
 +features for pose and expression classification.
 +https://​arxiv.org/​pdf/​0709.0303.pdf Navigability of complex networks
 +https://​openreview.net/​forum?​id=BJRZzFlRb Compressing Word Embeddings via Deep Compositional Code Learning
 +Natural language processing (NLP) models often require a massive number of parameters for word embeddings, resulting in a large storage or memory footprint. Deploying neural NLP models to mobile devices requires compressing the word embeddings without any significant sacrifices in performance. For this purpose, we propose to construct the embeddings with few basis vectors. For each word, the composition of basis vectors is determined by a hash code. To maximize the compression rate, we adopt the multi-codebook quantization approach instead of binary coding scheme. Each code is composed of multiple discrete numbers, such as (3, 2, 1, 8), where the value of each component is limited to a fixed range. We propose to directly learn the discrete codes in an end-to-end neural network by applying the Gumbel-softmax trick. Experiments show the compression rate achieves 98% in a sentiment analysis task and 94% ~ 99% in machine translation tasks without performance loss. In both tasks, the proposed method can improve the model performance by slightly lowering the compression rate. Compared to other approaches such as character-level segmentation,​ the proposed method is language-independent and does not require modifications to the network architecture.
 +https://​arxiv.org/​pdf/​1802.00273v1.pdf Emerging Language Spaces Learned From
 +Massively Multilingual Corpora
 +https://​arxiv.org/​abs/​1803.00385 MAGAN: Aligning Biological Manifolds
 + We present a new GAN called the Manifold-Aligning GAN (MAGAN) that aligns two manifolds such that related points in each measurement space are aligned together. We demonstrate applications of MAGAN in single-cell biology in integrating two different measurement types together. In our demonstrated examples, cells from the same tissue are measured with both genomic (single-cell RNA-sequencing) and proteomic (mass cytometry) technologies. We show that the MAGAN successfully aligns them such that known correlations between measured markers are improved compared to other recently proposed models.
 +https://​arxiv.org/​pdf/​1802.10151v1.pdf Augmented CycleGAN: Learning Many-to-Many Mappings
 +from Unpaired Data
 +https://​arxiv.org/​abs/​1803.08495 Text2Shape: Generating Shapes from Natural Language by Learning Joint Embeddings
 +We present a method for generating colored 3D shapes from natural language. To this end, we first learn joint embeddings of freeform text descriptions and colored 3D shapes. Our model combines and extends learning by association and metric learning approaches to learn implicit cross-modal connections,​ and produces a joint representation that captures the many-to-many relations between language and physical properties of 3D shapes such as color and shape. To evaluate our approach, we collect a large dataset of natural language descriptions for physical 3D objects in the ShapeNet dataset. With this learned joint embedding we demonstrate text-to-shape retrieval that outperforms baseline approaches. Using our embeddings with a novel conditional Wasserstein GAN framework, we generate colored 3D shapes from text. Our method is the first to connect natural language text with realistic 3D objects exhibiting rich variations in color, texture, and shape detail. ​  ​http://​text2shape.stanford.edu/​
 +https://​arxiv.org/​pdf/​1804.00104v1.pdf Joint-VAE: Learning Disentangled Joint Continuous and Discrete Representations
 +We have proposed Joint-VAE, a framework for learning disentangled continuous and discrete representations
 +in an unsupervised manner. The framework comes with the advantages of VAEs such
 +as stable training and large sample diversity while being able to model complex jointly continuous
 +and discrete generative factors. We have shown that Joint-VAE disentangles factors of variation on
 +several datasets while producing realistic samples. In addition, the inference network can be used to
 +infer unlabeled quantities on test data and to edit and manipulate images.
 +https://​arxiv.org/​pdf/​1804.00410v1.pdf SyncGAN: Synchronize the Latent Space of Cross-modal
 +Generative Adversarial Networks
 +. Instead of learning the transfer between different modalities, we aim to learn a synchronous latent space
 +representing the cross-modal common concept. A novel network component named synchronizer is proposed in this work to
 +judge whether the paired data is synchronous/​corresponding or not, which can constrain the latent space of generators in the
 +GANs. Our GAN model, named as SyncGAN, can successfully generate synchronous data (e.g., a pair of image and sound)
 +from identical random noise. For transforming data from one modality to another, we recover the latent code by inverting
 +the mappings of a generator and use it to generate data of different modality. In addition, the proposed model can achieve
 +semi-supervised learning, which makes our model more flexible for practical applications.
 +Cross-domain GANs adopt several special mechanisms such as cycle-consistency and weight-sharing to extract the common
 +structure of cross-domain data automatically. However, the common structure does not exist between most cross-modal data
 +due to the heterogeneous gap. Therefore, the model need paired information to relate the different structures between data
 +of various modalities which are of the same concept.
 +https://​arxiv.org/​abs/​1703.04368v1 Symbol Grounding via Chaining of Morphisms
 +https://​arxiv.org/​abs/​1805.04174 Joint Embedding of Words and Labels for Text Classification
 +Word embeddings are effective intermediate representations for capturing semantic regularities between words, when learning the representations of text sequences. We propose to view text classification as a label-word joint embedding problem: each label is embedded in the same space with the word vectors. We introduce an attention framework that measures the compatibility of embeddings between text sequences and labels. The attention is learned on a training set of labeled samples to ensure that, given a text sequence, the relevant words are weighted higher than the irrelevant ones. Our method maintains the interpretability of word embeddings, and enjoys a built-in ability to leverage alternative sources of information,​ in addition to input text sequences. Extensive results on the several large text datasets show that the proposed framework outperforms the state-of-the-art methods by a large margin, in terms of both accuracy and speed.