Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
bottleneck_layer [2018/04/23 18:40]
admin
bottleneck_layer [2018/04/23 18:46] (current)
admin
Line 60: Line 60:
  
 In this paper, we study the dimensionality of the learned representations by models that have proved highly succesful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100 datasets, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank representations of activations at various stages in the neural network. Empirically,​ we show that this has implications for compression and robustness to adversarial examples. In this paper, we study the dimensionality of the learned representations by models that have proved highly succesful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100 datasets, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank representations of activations at various stages in the neural network. Empirically,​ we show that this has implications for compression and robustness to adversarial examples.
 + The modification “adds” virtual low-rank layers to the model that ensure that the learned
 +representations roughly lie in a low-rank space. The modified objective function is optimized using an
 +alternate minimization approach, reminiscent of that used in iterative hard thresholding (Blumensath and
 +Davies, 2009) or singular value projection (Jain et al., 2010). Using a na¨ıve singular value thresholding
 +approach would render the training intractable for all practical purposes; we use a column sampling based
 +Nystr¨om method (Williams and Seeger, 2001; Halko et al., 2011) to achieve significant speed-up, though at
 +the cost of not getting the optimal low rank projections. One can view this modified training process as a way
 +to constrain the neural network, though in a way that is very different to the widely used sparsity inducing
 +methods(eg. Anwar et al. (2017); Wen et al. (2016)) or structurally constrained methods(eg. Moczulski et al.
 +(2015); Liu et al. (2015))that seek to tackle the problem of over-parametrization