Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
similarity [2018/05/18 01:46]
admin
similarity [2018/11/03 12:25]
admin
Line 189: Line 189:
 https://​arxiv.org/​abs/​1805.06576 A Spline Theory of Deep Networks (Extended Version) https://​arxiv.org/​abs/​1805.06576 A Spline Theory of Deep Networks (Extended Version)
  
-We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent,​ class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classifi- cation performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by a MASO directly links DNs to the theory of vector quantization (VQ) and K-means clustering, which opens up new geometric avenue to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation,​ we develop and validate a new distance metric for signals and images that quantifies the difference between their VQ encodings. (This paper is a significantly expanded version of a paper with the same title that will appear at ICML 2018.)+We build a rigorous bridge between deep networks (DNs) and approximation theory via spline functions and operators. Our key result is that a large class of DNs can be written as a composition of max-affine spline operators (MASOs), which provide a powerful portal through which to view and analyze their inner workings. For instance, conditioned on the input signal, the output of a MASO DN can be written as a simple affine transformation of the input. This implies that a DN constructs a set of signal-dependent,​ class-specific templates against which the signal is compared via a simple inner product; we explore the links to the classical theory of optimal classification via matched filters and the effects of data memorization. Going further, we propose a simple penalty term that can be added to the cost function of any DN learning algorithm to force the templates to be orthogonal with each other; this leads to significantly improved classifi- cation performance and reduced overfitting with no change to the DN architecture. The spline partition of the input signal space that is implicitly induced by a MASO directly links DNs to the theory of vector quantization (VQ) and K-means clustering, which opens up new geometric avenue to study how DNs organize signals in a hierarchical fashion. To validate the utility of the VQ interpretation,​ we develop and validate a new distance metric for signals and images that quantifies the difference between their VQ encodings. (This paper is a significantly expanded version of a paper with the same title that will appear at ICML 2018.)
 + 
 +Orthogonality penalty a term that penalizes non-zero off-diagonal entries 
 +in the matrix leading to the new loss with 
 +extra penalty. 
 + 
 +https://​arxiv.org/​abs/​1807.02873v1 Separability is not the best goal for machine learning 
 + 
 +https://​arxiv.org/​abs/​1807.11440v1 Comparator Networks 
 + 
 +(i) We propose a Deep Comparator Network (DCN) that can ingest a pair of sets (each may contain a variable number of images) as inputs, and compute a similarity between the pair--this involves attending to multiple discriminative local regions (landmarks),​ and comparing local descriptors between pairs of faces; (ii) To encourage high-quality representations for each set, internal competition is introduced for recalibration based on the landmark score; (iii) Inspired by image retrieval, a novel hard sample mining regime is proposed to control the sampling process, such that the DCN is complementary to the standard image classification models. 
 + 
 +https://​arxiv.org/​abs/​1808.00508v1 Neural Arithmetic Logic Units 
 + 
 +Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images. In contrast to conventional architectures,​ we obtain substantially better generalization both inside and outside of the range of numerical values encountered during training, often extrapolating orders of magnitude beyond trained numerical ranges. 
 + 
 +https://​www.quantamagazine.org/​universal-method-to-sort-complex-information-found-20180813 
 + 
 +https://​arxiv.org/​pdf/​1808.07526.pdf Deep Neural Network Structures Solving Variational Inequalities∗ 
 + 
 +We propose a novel theoretical framework to investigate deep neural networks using the 
 +formalism of proximal fixed point methods for solving variational inequalities. We first show that 
 +almost all activation functions used in neural networks are actually proximity operators. This leads 
 +to an algorithmic model alternating firmly nonexpansive and linear operators. We derive new results 
 +on averaged operator iterations to establish the convergence of this model, and show that the limit 
 +of the resulting algorithm is a solution to a variational inequality 
 + 
 +https://​arxiv.org/​abs/​1810.02906v1 Network Distance Based on Laplacian Flows on Graphs 
 + 
 +Our key insight is to define a distance based on the long term diffusion behavior of the whole network. We first introduce a dynamic system on graphs called Laplacian flow. Based on this Laplacian flow, a new version of diffusion distance between networks is proposed. We will demonstrate the utility of the distance and its advantage over various existing distances through explicit examples. The distance is also applied to subsequent learning tasks such as clustering network objects. 
 + 
 +https://​arxiv.org/​pdf/​1810.13337v1.pdf LEARNING TO REPRESENT EDITS 
 + 
 +By combining 
 +a “neural editor” with an “edit encoder”, our models learn to represent the 
 +salient information of an edit and can be used to apply edits to new inputs. We 
 +experiment on natural language and source code edit data.  
 + 
 +https://​arxiv.org/​abs/​1808.10584 Learning to Describe Differences Between Pairs of Similar Images 
 + 
 +We collect a new dataset by crowd-sourcing difference descriptions for pairs of image frames extracted from video-surveillance footage. ​