Similarity Network



Given two sample objects, predict if the samples are the semantically the same object..


How can we train a network to learn to test for similar objects?


This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern. Discussion

This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We don’t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.

Known Uses

Here we review several projects or papers that have used this pattern.

Related Patterns In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.

Relationship to Canonical Patterns

Relationship to other Patterns

Further Reading

We provide here some additional external material that will help in exploring this pattern in more detail.


To aid in reading, we include sources that are referenced in the text in the pattern.

References A Tutorial on Energy-Based Learning

(a): A simple architecture that can be trained with the energy loss. (b): An implicit regression architecture where X and Y are passed through functions G1W1 and G2W2 respectively. Training this architecture with the energy loss causes a collapse (a flat energy surface). A loss function with a contrastive term corrects the problem. Siamese Neural Networks for One-shot Image Recognition Deep metric learning using Triplet network Learning a Similarity Metric Discriminatively, with Application to Face Verification Fully-Convolutional Siamese Networks for Object Tracking Attentive Recurrent Comparators

Many problems in Artificial Intelligence and Machine Learning can be reduced to the problem of quantitative comparison of two entities. In Deep Learning the ubiquitous architecture used for this task is the Siamese Neural Network which maps each entity to a representation through a learnable function and expresses similarity through the distances among the entities in the representation space. In this paper, we argue that such a static and invariant mapping is both naive and unnatural. We develop a novel neural model called Attentive Recurrent Comparators (ARCs) that dynamically compares two entities and test the model extensively on the Omniglot dataset. In the task of similarity learning, our simplistic model that does not use any convolutions performs on par with Deep Convolutional Siamese Networks and significantly better when convolutional layers are also used. In the challenging task of one-shot learning on the same dataset, an ARC based model achieves the first super-human performance for a neural method with an error rate of 1.5\%. Similarity Preserving Representation Learning for Time Series Analysis

hms are effective, robust, efficient, and easy to use. In this paper, we bridge this gap by proposing an efficient representation learning framework that is able to convert a set of time series with equal or unequal lengths to a matrix format. In particular, we guarantee that the pairwise similarities between time series are well preserved after the transformation. Therefore, the learned feature representation is particularly suitable to the class of learning problems that are sensitive to data similarities. Given a set of n time series, we first construct an n×n partially observed similarity matrix by randomly sampling O(n log n) pairs of time series and computing their pairwise similarities. Generative Adversarial Residual Pairwise Networks for One Shot Learning

Our proposed model Skip Residual Pairwise Net (SRPN). The network separates the intermediate computations for the inputs x and the image being compared xt, which are then passed through separate pathways using the residual connections. The final output is a single similarity vector for the pair where the distance measure is itself learned by the network.

To summarize, we identified fixed distance measures and weak regularization as major challenges to similarity matching and its extension to one shot learning and presented a network design for each of the problems. Our Skip Residual Pairwise Network outperforms an equivalent Residual Siamese Network and achieves state of the art performance on the mini-Imagenet one shot classification dataset. Our Generative Regularizer shows promising results and outperforms L2-regularization on the Omniglot dataset. Learning Two-Branch Neural Networks for Image-Text Matching Tasks

We propose two different network structures that produce different output representations. The first one, referred to as an embedding network, learns an explicit shared latent embedding space with a maximum-margin ranking loss and novel neighborhood constraints. The second one, referred to as a similarity network, fuses the two branches via element-wise product and is trained with regression loss to directly predict a similarity score. Extensive experiments show that our two-branch networks achieve high accuracies for phrase localization on the Flickr30K Entities dataset and for bi-directional image-sentence retrieval on Flickr30K and MSCOCO datasets. Conditional Similarity Networks

We propose Conditional Similarity Networks (CSNs) that learn embeddings differentiated into semantically distinct subspaces that capture the different notions of similarities. CSNs jointly learn a disentangled embedding where features for different similarities are encoded in separate dimensions as well as masks that select and reweight relevant dimensions to induce a subspace that encodes a specific similarity notion. We show that our approach learns interpretable image representations with visually relevant semantic subspaces.