Metric Learning

Edit: https://docs.google.com/a/codeaudit.com/document/d/1CXcoeD1t34HW6aQHEK8ziHop9UT1RFB-jgRo1B5oTFA/edit?usp=sharing

Differential Training, Similarity Learning

Discusssion

The goal of metric learning is to ensure that, after training, the distance between vectors of the same class remains small, while distance between different classes are large.

References

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.67.2646&rep=rep1&type=pdf DIFFERENTIAL TRAINING OF1 ROLLOUT POLICIES

http://www.cs.toronto.edu/~rsalakhu/papers/oneshot1.pdf Siamese Neural Networks for One-shot Image Recognition

http://yann.lecun.com/exdb/publis/pdf/chopra-05.pdf Learning a Similarity Metric Discriminatively, with Application to Face Verification

The learning process minimizes a discriminative loss function that drives the similarity metric to be small for pairs of faces from the same person, and large for pairs from different persons.

https://arxiv.org/abs/1412.6622 Deep metric learning using Triplet network

https://en.wikipedia.org/wiki/Similarity_learning

http://web.cse.ohio-state.edu/~kulis/pubs/ftml_metric_learning.pdf Metric Learning: A Survey

http://arxiv.org/pdf/1509.05360v1.pdf Geometry-aware Deep Transform

Deep networks are often optimized for a classification objective, where class-labeled samples are input as training ; or a metric learning objective, where training data are input as positive and negative pairs.

In this section, we first propose a novel deep learning objective that unifies the classification and metric learning criteria. We then introduce a geometry-aware deep transform, and optimize it through standard back-propagation.

We denote formulation as Geometry aware Deep Transform (GDT). The GDT objective is a weighted combination of the two formulations. We can understand it as regularizing the metric learning formulation using the classification one.

https://devblogs.nvidia.com/parallelforall/understanding-aesthetics-deep-learning/

http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41473.pdf DeViSE: A Deep Visual-Semantic Embedding Model

One remedy is to leverage data from other sources – such as text data – both to train visual models and to constrain their predictions. In this paper we present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text.

https://arxiv.org/abs/1610.08904v1 Local Similarity-Aware Deep Feature Embedding

the global Euclidean distance cannot faithfully characterize the true feature similarity in a complex visual feature space, where the intraclass distance in a high-density region may be larger than the interclass distance in low-density regions. In this paper, we introduce a Position-Dependent Deep Metric (PDDM) unit, which is capable of learning a similarity metric adaptive to local feature structure.

https://arxiv.org/pdf/1611.02268v1.pdf OPTIMAL BINARY AUTOENCODING WITH PAIRWISE CORRELATIONS

https://arxiv.org/pdf/1703.07464v1.pdf No Fuss Distance Metric Learning using Proxies

We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity. Traditionally, for this problem supervision is expressed in the form of sets of points that follow an ordinal relationship – an anchor point x is similar to a set of positive points Y, and dissimilar to a set of negative points Z, and a loss defined over these distances is minimized. While the specifics of the optimization differ, in this work we collectively call this type of supervision Triplets and all methods that follow this pattern Triplet-Based methods. These methods are challenging to optimize. A main issue is the need for finding informative triplets, which is usually achieved by a variety of tricks such as increasing the batch size, hard or semi-hard triplet mining, etc, but even with these tricks, the convergence rate of such methods is slow. In this paper we propose to optimize the triplet loss on a different space of triplets, consisting of an anchor data point and similar and dissimilar proxy points. These proxies approximate the original data points, so that a triplet loss over the proxies is a tight upper bound of the original loss. This proxy-based loss is empirically better behaved. As a result, the proxy-loss improves on state-of-art results for three standard zero-shot learning datasets, by up to 15% points, while converging three times as fast as other triplet-based losses.