This shows you the differences between two versions of the page.

Link to this comparison view

filter_groups [2016/11/30 15:51] (current)
Line 1: Line 1:
 +** References **
 +AlexNet Filter Groups. Amongst the seminal contributions made by Krizhevsky
 +et al. is the use of ‘filter groups’ in the convolutional layers of a CNN.   While the use of filter groups was necessitated by the practical consideration
 +of sub-dividing the work of training a large network across multiple GPUs,
 +the side effects are somewhat surprising. Specifically,​ the authors observe that
 +independent filter groups learn a separation of responsibility (colour features vs.
 +texture features) that is consistent over different random initializations. Also
 +surprising, and not explicitly stated, is the fact that the AlexNet network
 +has approximately 57% fewer connection weights than the corresponding
 +network without filter groups (see Fig. 2). This is due to the reduction in the
 +input channel dimension of the grouped convolution filters. Surprisingly,​ despite the large difference in the number of parameters between the models, both architectures
 +achieve comparable error on ILSVRC – in fact the smaller grouped
 +network gets ≈ 1% lower top-5 validation error. ​
 +Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet Classification with Deep
 +Convolutional Neural Networks. In: Bartlett, P.L., Pereira, F.C.N., Burges, C.J.C.,
 +Bottou, L., Weinberger, K.Q. (eds.) NIPS. pp. 1106–1114 (2012)
 +http://​arxiv.org/​pdf/​1605.06489v1.pdf ​ Deep Roots:
 +Improving CNN Efficiency with
 +Hierarchical Filter Groups
 +We use hierarchical filter groups to allow the network itself to learn independent filters. By restricting
 +the connectivity between filters on subsequent layers the network is forced to
 +learn filters of limited interdependence.
 +We explored the effect of using a complex hierarchical arrangement of filter groups in CNNs and
 +show that imposing a structured decrease in the degree of filter grouping with depth – a ‘root’ (inverse
 +tree) topology – can allow us to obtain more efficient variants of state-of-the-art networks without
 +compromising accuracy. Our method appears to be complementary to existing methods, such as
 +low-dimensional embeddings, and can be used more efficiently to train deep networks than methods
 +that only approximate a pre-trained model’s weights.