Discussion

One paper [CHAOS2] analyzes the consequences of depth in a deep learning network and argues that 'transient chaos' is present and is truncated by the depth of the network. Each layer in a deep learning network is a single step an iterative map, the deeper the network, the higher the probability of chaotic behavior.

Another paper [CHAOS1] argues that chaos is a consequence of the iterative learning process. However the paper discusses the behavior in the context of decision trees and not deep learning networks that have differentiable layers.

References

[CHAOS1] http://arxiv.org/pdf/1407.7417v1.pdf ‘ALMOST SURE’ CHAOTIC PROPERTIES OF MACHINE LEARNING METHODS

[CHAOS2] http://arxiv.org/abs/1606.05340v1 Exponential expressivity in deep neural networks through transient chaos

http://lossfunctions.tumblr.com/

http://arxiv.org/abs/1606.06737v2 Critical Behavior from Deep Dynamics: A Hidden Dimension in Natural Language

How can we know when machines are bad or good? The old answer is to compute the loss function. The new answer is to also compute the mutual information as a function of separation, which can immediately show how well the model is doing at capturing correlations on different scales.

http://openreview.net/pdf?id=S1dIzvclg A RECURRENT NEURAL NETWORK WITHOUT CHAOS

https://arxiv.org/pdf/1611.01232v1.pdf DEEP INFORMATION PROPAGATION

We study the behavior of untrained neural networks whose weights and biases are randomly distributed using mean field theory. We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks. Our main practical result is to show that random networks may be trained precisely when information can travel through them. Thus, the depth scales that we identify provide bounds on how deep a network may be trained for a specific choice of hyperparameters. As a corollary to this, we argue that in networks at the edge of chaos, one of these depth scales diverges. Thus arbitrarily deep networks may be trained only sufficiently close to criticality. We show that the presence of dropout destroys the order-to-chaos critical point and therefore strongly limits the maximum trainable depth for random networks. Finally, we develop a mean field theory for backpropagation and we show that the ordered and chaotic phases correspond to regions of vanishing and exploding gradient respectively.

These results suggest that theoretical work on random neural networks can be used to inform practical architectural decisions. However, there is still much work to be done. For instance, the framework developed here does not apply to unbounded activations, such as rectified linear units, where it can be shown that there are phases in which eq. 3 does not have a fixed point. Additionally, the analysis here applies directly only to fully connected feed-forward networks, and will need to be extended to architectures with structured weight matrices such as convolutional networks.

http://chaosbook.org/

http://samoa.santafe.edu/media/workingpapers/01-09-049.pdf

https://arxiv.org/abs/1504.02010 A Chaotic Dynamical System that Paints

http://csc.ucdavis.edu/~cmg/papers/chaos_anatomy.pdf Chaos Forgets and Remembers: Measuring Information Creation, Destruction, and Storage

The hallmark of deterministic chaos is that it creates information—the rate being given by the Kolmogorov-Sinai metric entropy. Since its introduction half a century ago, the metric entropy has been used as a unitary quantity to measure a system’s intrinsic unpredictability. Here, we show that it naturally decomposes into two structurally meaningful components: A portion of the created information—the ephemeral information—is forgotten and a portion—the bound information—is remembered. The bound information is a new kind of intrinsic computation that differs fundamentally from information creation: it measures the rate of active information storage. We show that it can be directly and accurately calculated via symbolic dynamics, revealing a hitherto unknown richness in how dynamical systems compute.

http://csc.ucdavis.edu/~cmg/papers/Crutchfield.NaturePhysics2012.corrected.pdf Between order and chaos

https://arxiv.org/abs/1512.08575v1 Optimal Selective Attention in Reactive Agents

In this report we present the minimum-information principle for selective attention in reactive agents. We further motivate this approach by reducing the general problem of optimal control in POMDPs, to reactive control with complex observations. Lastly, we explore a newly discovered phenomenon of this optimization process - period doubling bifurcations. This necessitates periodic policies, and raises many more questions regarding stability, periodicity and chaos in optimal control.

https://arxiv.org/abs/1705.05551v1 New Reinforcement Learning Using a Chaotic Neural Network for Emergence of “Thinking” - “Exploration” Grows into “Thinking” through Learning

the emergence of “thinking” that is a typical higher function is difficult to realize because “thinking” needs non fixed-point, flow-type attractors with both convergence and transition dynamics. Furthermore, in order to introduce “inspiration” or “discovery” in “thinking”, not completely random but unexpected transition should be also required.

http://www.mdpi.com/1099-4300/19/5/188/htm When the Map Is Better Than the Territory

Recent research applying information theory to causal analysis has shown that the causal structure of some systems can actually come into focus and be more informative at a macroscale. That is, a macroscale description of a system (a map) can be more informative than a fully detailed microscale description of the system (the territory). This has been called “causal emergence.”

https://arxiv.org/abs/1802.09979v1 The Emergence of Spectral Universality in Deep Networks

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude.

Our results provide a principled framework for the initialization of weights and the choice of nonlinearities in order to produce well-conditioned Jacobians and fast learning. Intriguingly, we find novel universality classes of deep spectra that remain well-conditioned as the depth goes to infinity, as well as theoretical conditions for their existence.

https://arxiv.org/abs/1711.09072 Entropy-based Generating Markov Partitions for Complex Systems

The task gets even more complicated if the system is a network composed of interacting dynamical units, namely, a high-dimensional complex system. Here, we tackle this task and solve it by defining a method to approximately construct GMPs for any complex system's finite-resolution and finite-time trajectory. We critically test our method on networks of coupled maps, encoding their trajectories into symbolic sequences. We show that these sequences are optimal because they minimise the information loss and also any spurious information added. Consequently, our method allows us to approximately calculate the invariant probability measures of complex systems from observed data.

https://arxiv.org/pdf/1803.04779.pdf Hybrid Forecasting of Chaotic Processes: Using Machine Learning in Conjunction with a Knowledge-Based Model

https://arxiv.org/abs/1710.07313 Using Machine Learning to Replicate Chaotic Attractors and Calculate Lyapunov Exponents from Data

https://arxiv.org/abs/1805.03362 Attractor Reconstruction by Machine Learning