Bottleneck Layer


This identifies the pattern and should be representative of the concept that it describes. The name should be a noun that should be easily usable within a sentence. We would like the pattern to be easily referenceable in conversation between practitioners.


Describes in a single concise sentence the meaning of the pattern.


This section describes the reason why this pattern is needed in practice. Other pattern languages indicate this as the Problem. In our pattern language, we express this in a question or several questions and then we provide further explanation behind the question.


This section provides alternative descriptions of the pattern in the form of an illustration or alternative formal expression. By looking at the sketch a reader may quickly understand the essence of the pattern. Discussion

This is the main section of the pattern that goes in greater detail to explain the pattern. We leverage a vocabulary that we describe in the theory section of this book. We don’t go into intense detail into providing proofs but rather reference the sources of the proofs. How the motivation is addressed is expounded upon in this section. We also include additional questions that may be interesting topics for future research.

Known Uses

Here we review several projects or papers that have used this pattern.

Related Patterns In this section we describe in a diagram how this pattern is conceptually related to other patterns. The relationships may be as precise or may be fuzzy, so we provide further explanation into the nature of the relationship. We also describe other patterns may not be conceptually related but work well in combination with this pattern.

Relationship to Canonical Patterns

Relationship to other Patterns

Further Reading

We provide here some additional external material that will help in exploring this pattern in more detail.


To aid in reading, we include sources that are referenced in the text in the pattern. Going Deeper with Convolutions Deep Variational Information Bottleneck

We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method “Deep Variational Information Bottleneck”, or Deep VIB. We show that models trained with the VIB objective outperform those that are trained with other forms of regularization, in terms of generalization performance and robustness to adversarial attack. Low Rank Structure of Learned Representations

In this paper, we study the dimensionality of the learned representations by models that have proved highly succesful for image classification. We focus on ResNet-18, ResNet-50 and VGG-19 and observe that when trained on CIFAR10 or CIFAR100 datasets, the learned representations exhibit a fairly low rank structure. We propose a modification to the training procedure, which further encourages low rank representations of activations at various stages in the neural network. Empirically, we show that this has implications for compression and robustness to adversarial examples. The modification “adds” virtual low-rank layers to the model that ensure that the learned representations roughly lie in a low-rank space. The modified objective function is optimized using an alternate minimization approach, reminiscent of that used in iterative hard thresholding (Blumensath and Davies, 2009) or singular value projection (Jain et al., 2010). Using a na¨ıve singular value thresholding approach would render the training intractable for all practical purposes; we use a column sampling based Nystr¨om method (Williams and Seeger, 2001; Halko et al., 2011) to achieve significant speed-up, though at the cost of not getting the optimal low rank projections. One can view this modified training process as a way to constrain the neural network, though in a way that is very different to the widely used sparsity inducing methods(eg. Anwar et al. (2017); Wen et al. (2016)) or structurally constrained methods(eg. Moczulski et al. (2015); Liu et al. (2015))that seek to tackle the problem of over-parametrization