Distributed Model



Deep Neural Networks (or Connectionist systems in general) have knowledge distributed or diffused across its many processing nodes.


Is better trainability, expressivity and generalization improved with higher knowledge diffusion?


<Diagram that shows for each layer that a different representation is created>


Distributed Representation is a consequence of the ensembles capable of classifying different perspectives of data. A single neuron is a classification machine and thus can be treated as a simple Ensemble. The neurons for each layer maps data into a new multi-dimensional space. In this mapped space, every vector is a combination of the features identified in the previous layer. Alternatively, every vector is a combination of the classification of multiple simple ensembles. Distributed Representation furthermore has a fractal characteristic in that even in each layer, the representation of a feature is distributed across network weights (Hinton et al, 1986). In other words, it would be unlikely to find a single weight that captures the semantics of a feature.

Advantages of distributed representations is that they are tolerant to errors and they degrade graceful on failure (Medler et al., 2005). The disadvantages however are that these representations make interpretation extremely improbable (Dawson, 2004).

Known Uses

Related Patterns


Relationship to Canonical Patterns

  • Entropy specifically higher entropy is the most likely configuration which predicts distributed representations.
  • Random Projections nature is that the randomness of projection planes leads to a distributed representation.
  • Hyperuniformity exists in these representation as a consequence of constraints enforced by regularization.
  • Invariant Representation leads to simpler more sparse representations because invariant features are filtered out.
  • Random Orthogonal Initialization encourages the evolution toward random and therefore distributed representations.
  • Ensembles that are implicit superimposed in the network leads to greater predicting power.
  • Regularization encourages structure outside of randomness. An example is L1 regularization that encourages sparsity or parameters.
  • Mutual Information measures the degree of dependency between parameters. Is this an intrinsic measure of model entanglement?
  • Disentangled Basis is the opposite of distributed representations where one can factor out an orthogonal basis for the model.
  • Associative Memory has a measure of entanglement based on the degree of interaction between nodes.
  • Hierarchical Abstraction further increases the entanglement as layers are mixed into higher layers.
  • Self Similarity indicates a fractal nature of the model.


References 15.4 Distributed representations of concepts—representations composed of many elements that can be set separately from each other are one of the most important tools for representation learning.


Dawson, M. R. W. (2004). Minds And Machines : Connectionism And Psychological Modeling. Malden, MA: Blackwell Pub.

Hinton, G. E., McClelland, J., & Rumelhart, D. (1986). Distributed representations. In D. Rumelhart & J. McClelland (Eds.), Parallel Distributed Processing (Vol. 1, pp. 77-109). Cambridge, MA: MIT Press.

Medler, D. A., Dawson, M. R. W., & Kingstone, A. (2005). Functional localization and double dissociations: The relationship between internal structure and behavior. Brain and Cognition, 57, 146-150.

Are DL systems or ?? Are the laws of entanglement theory thermodynamical?