 https://​arxiv.org/​abs/​1710.05941 Swish: a Self-Gated Activation Function https://​arxiv.org/​abs/​1710.05941 Swish: a Self-Gated Activation Function
 +https://​arxiv.org/​pdf/​1712.01897.pdf Online Learning with Gated Linear Networks  
 +Rather than relying on non-linear transfer functions, our method gains representational power by the use of data conditioning. We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borel-measurable function on a compact subset of euclidean space; the result is stronger than many universality results for connectionist architectures because we provide both the model and the learning procedure for which convergence is guaranteed.