Name Hidden Layer Regularization


Add regularization terms that involve activations from inner layers.


How can we influence the structure of the model patterns of the inner hidden layers?

References Semi-Supervised Learning with Ladder Networks

We combine supervised learning with unsupervised learning in deep neural networks. The proposed model is trained to simultaneously minimize the sum of supervised and unsupervised cost functions by backpropagation, avoiding the need for layer-wise pre-training. Our work builds on the Ladder network proposed by Valpola (2015), which we extend by combining the model with supervision.

Note: A regularization term is added to minimize the output of a layer with its corresponding decoder. Group Sparse Regularization for Deep Neural Networks

We show that a sparse version of the group Lasso penalty is able to achieve competitive performances, while at the same time resulting in extremely compact networks with a smaller number of input features. Enlightening Deep Neural Networks with Knowledge of Confounding Factors

We incorporate information on prominent auxiliary explanatory factors of the data population into existing architectures as secondary objective/loss blocks that take inputs from hidden layers during training. Joint Deep Learning for Pedestrian Detection

This paper proposes that they should be jointly learned in order to maximize their strengths through cooperation. We formulate these four components into a joint deep learning framework and propose a new deep network architecture. Joint Deep Learning for Car Detection Cost-Sensitive Deep Learning with Layer-Wise Cost Estimation

we propose a novel framework that can be applied to deep neural networks with any structure to facilitate their learning of meaningful representations for cost-sensitive classification problems. Furthermore, the framework allows end- to-end training of deeper networks directly. The framework is designed by augmenting auxiliary neurons to the output of each hidden layer for layer-wise cost estimation, and including the total estimation loss within the optimization objective.

CSDNN starts with a regular DNN with fully-connected layers, but replaces the softmax layer at the end of the DNN by a cost-estimation layer. Each of the K neurons in the cost-estimation layer provides per-class cost estimation with regression instead of per-class probability estimation.