References

http://arxiv.org/pdf/1511.06321v5.pdf NEURAL NETWORK-BASED CLUSTERING USING PAIRWISE CONSTRAINTS

This paper presents a neural network-based end-to-end clustering framework. We design a novel strategy to utilize the contrastive criteria for pushing data-forming clusters directly from raw data, in addition to learning a feature embedding suitable for such clustering. The network is trained with weak labels, specifically partial pairwise relationships between data instances. The cluster assignments and their probabilities are then obtained at the output layer by feed-forwarding the data. The framework has the interesting characteristic that no cluster centers need to be explicitly specified, thus the resulting cluster distribution is purely data-driven and no distance metrics need to be predefined. The experiments show that the proposed approach beats the conventional two-stage method (feature embedding with k-means) by a significant margin. It also compares favorably to the performance of the standard cross entropy loss for classification. Robustness analysis also shows that the method is largely insensitive to the number of clusters. Specifically, we show that the number of dominant clusters is close to the true number of clusters even when a large k is used for clustering.

http://arxiv.org/pdf/1512.01752v2.pdf Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation

Traditional graph-based semi-supervised learning (SSL) approaches are not suited for massive data and large label scenarios since they scale linearly with the number of edges

E

the large label size problem, recent works propose sketch-based methods to approximate the label distribution per node thereby achieving a space reduction from O(m) to O(log m), under certain conditions. In this paper, we present a novel streaming graphbased SSL approximation that effectively captures the sparsity of the label distribution and further reduces the space complexity per node to O(1). We also provide a distributed version of the algorithm that scales well to large data sizes. Experiments on real-world datasets demonstrate that the new method achieves better performance than existing state-of-the-art algorithms with significant reduction in memory footprint. Finally, we propose a robust graph augmentation strategy using unsupervised deep learning architectures that yields further significant quality gains for SSL in natural language applications.

http://arxiv.org/abs/1607.08477v1 SSDH: Semi-supervised Deep Hashing for Large Scale Image Retrieval

In this paper, we propose the semi-supervised deep hashing (SSDH) method, to perform more effective hash learning by simultaneously preserving the semantic similarity and the underlying data structures. Our proposed approach can be divided into two phases. First, a deep network is designed to extensively exploit both the labeled and unlabeled data, in which we construct the similarity graph online in a mini-batch with the deep feature representations. Second, we propose a loss function suitable for the semi-supervised scenario by jointly minimizing the empirical error on the labeled data as well as the embedding error on both the labeled and unlabeled data, which can preserve the semantic similarity, as well as capture the meaningful neighbors on the underlying data structures for effective hashing.

https://arxiv.org/abs/1702.08648v1 Deep Clustering using Auto-Clustering Output Layer

we propose a novel method to enrich the representation provided to the output layer of feedforward neural networks in the form of an auto-clustering output layer (ACOL) which enables the network to naturally create sub-clusters under the provided main class la- bels. In addition, a novel regularization term is introduced which allows ACOL to encourage the neural network to reveal its own explicit clustering objective.

https://arxiv.org/abs/1702.08833v1 Learning Deep Nearest Neighbor Representations Using Differentiable Boundary Trees

We introduce a new method called differentiable boundary tree which allows for learning deep kNN representations. We build on the recently proposed boundary tree algorithm which allows for efficient nearest neighbor classification, regression and retrieval. By modelling traversals in the tree as stochastic events, we are able to form a differentiable cost function which is associated with the tree's predictions. Using a deep neural network to transform the data and back-propagating through the tree allows us to learn good representations for kNN methods. We demonstrate that our method is able to learn suitable representations allowing for very efficient trees with a clearly interpretable structure.

Using boundary trees we are able to derive a differentiable cost function which allows for learning of such representations. The resulting representation allows for simple, interpretable tree structures with good performance in query time and accuracy. While the method has some limitations we feel this is an important research direction and are looking forward to further explore this directions using newly available dynamic batching tools such as TensorFlow Fold.

https://arxiv.org/abs/1706.05048 A new look at clustering through the lens of deep convolutional neural networks

We show that CNNs, trained end to end using back propagation with noisy labels, are able to cluster data points belonging to several overlapping shapes, and do so much better than the state of the art algorithms. The main takeaway lesson from our study is that mechanisms of human vision, particularly the hierarchal organization of the visual ventral stream should be taken into account in clustering algorithms (e.g., for learning representations in an unsupervised manner or with minimum supervision) to reach human level clustering performance.

https://arxiv.org/abs/1706.06136 On comparing clusterings: an element-centric framework unifies overlaps and hierarchy

Here we unify the comparison of disjoint, overlapping, and hierarchically struc- tured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering simi- larity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ.

https://arxiv.org/pdf/1708.07863.pdf k-Nearest Neighbor Augmented Neural Networks for Text Classification

https://openreview.net/pdf?id=B1CEaMbR- CLUSTERING WITH DEEP LEARNING: TAXONOMY AND NEW METHODS

https://arxiv.org/abs/1803.04765 Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning

This hybrid classifier combines the k-nearest neighbors algorithm with representations of the data learned by each layer of the DNN: a test input is compared to its neighboring training points according to the distance that separates them in the representations.

https://arxiv.org/abs/1807.05520v1 Deep Clustering for Unsupervised Learning of Visual Features