https://docs.google.com/a/codeaudit.com/document/d/19fs9A7LtOPDVwRL1nn9Y2UjrROBCf4bhir0zC8Iuyu8/edit?usp=sharing

Composite Model Patterns

This chapter is an extension of the Model patterns chapter. Here we focus our attention on composite model patterns. That is characteristic structures that encompass a much broader scope than patterns in the Model patterns chapter. We would like to examine the emergent properties that arise in a collection of models.

A learning machine is trained by fitting a model to observed data. The model in practice can consist of millions of parameters. This implies that training will likely overfit the data and therefore lead to poor generalization. Effective training thus requires technique lead to improved generalization by avoiding overfitting. Models with good generalization are able to make accurate predictions on data that the machine has never observed before in training. Ideally, one would prefer to have a machine to be able to explain at how it arrives at a conclusion. Unfortunately, ANNs are black boxes that have millions of uninterpretable parameters. Despite this, ANNs have discovered to have several recurring characteristics in trained models. One would hope that a trained model would coalesce into regions that reflect semantics, unfortunately studies have shown that the resulting models have random like characteristics. Despite this randomness, researchers have found emergent structures. Curiously enough, the model's parameters appear closer to random are found in machines that have very good generalization. Studying collections of models reveals a kind of duality between abstraction and consensus. A model is able to capture abstract concepts however isolating those abstract concepts is usually difficult to do. Models can also be built up from consensus. Models constructed thorough consensus leads to knowledge is diffused among many sub-models. Knowledge diffusion is likely fractal in nature such that models can diffuse knowledge across many layers or across neurons in the same layer. We shall see this as we explore this area that this recursiveness is common in many patterns. A machine at a high conceptual level has three properties. These are depth, width and multiplicity. Depth determines the layers of abstraction. Width determines the diffusion of ensembles. Multiplicity contributes to weighting of ensembles.

One key question to determine if a pattern should be included in the Composite Model category is “Does the pattern affect behavior in both training and inference stages?” An additional criteria is that Composite Models consist only of constructs that come from the Model pattern category.

Implicit Ensemble

Weight Sharing (Tied Weights) related to implicit ensemble

Layer Sharing

Weight Quantization

Layer Reversibility (Reversible Layer)

Network in Network

Residual

Ladder

Cardinality

Variational Autoencoder

Deep Neural Decision Tree (ALVT-200)

Convolution Recurrent Network

Correlational Network

Transition Based

Hourglass

Prior Knowledge

Iterative Inference

References

http://arxiv.org/pdf/1511.02799v3.pdf

Neural Module Networks

In this paper, we have introduced neural module networks, which provide a general-purpose framework for learning collections of neural modules which can be dynamically assembled into arbitrary deep networks. W