This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
batch_normalization [2018/07/08 13:20]
batch_normalization [2018/09/23 18:33] (current)
Line 173: Line 173:
 https://​github.com/​switchablenorms/​Switchable-Normalization https://​github.com/​switchablenorms/​Switchable-Normalization
 +https://​arxiv.org/​abs/​1706.05350 L2 Regularization versus Batch and Weight Normalization
 + L2 regularization has no regularizing effect when combined with normalization. Instead, regularization has an influence on the scale of weights, and thereby on the effective learning rate. We investigate this dependence, both in theory, and experimentally. We show that popular optimization methods such as ADAM only partially eliminate the influence of normalization on the learning rate.