How can we relate the characteristics of associative memory with the functioning of neural networks.


Dense Associative Memory for Pattern Recognition

We propose a simple duality between this dense associative memory and neural networks commonly used in deep learning. The proposed duality makes it possible to apply energy-based intuition from associative memory to analyze computational properties of neural networks with unusual activation functions – the higher rectified polynomials which until now have not been used for training neural networks Holographic Embeddings of Knowledge Graphs

The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator, HOLE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. Using Fast Weights to Attend to the Recent Past Dense Associative Memory is Robust to Adversarial Inputs Learning to update Auto-associative Memory in Recurrent Neural Networks for Improving Sequence Memorization Variational Memory Addressing in Generative Models Can Active Memory Replace Attention?

We propose an extended model of active memory that matches existing attention models on neural machine translation and generalizes better to longer sentences. We investigate this model and explain why previous active memory models did not succeed. Finally, we discuss when active memory brings most benefits and where attention can be a better choice. The Kanerva Machine: A Generative Distributed Memory We present an end-to-end trained memory system that quickly adapts to new data and generates samples like them. Inspired by Kanerva's sparse distributed memory, it has a robust distributed reading and writing mechanism. The memory is analytically tractable, which enables optimal on-line compression via a Bayesian update-rule. We formulate it as a hierarchical conditional generative model, where memory provides a rich data-dependent prior distribution. Consequently, the top-down memory and bottom-up perception are combined to produce the code representing an observation. Empirically, we demonstrate that the adaptive memory significantly improves generative models trained on both the Omniglot and CIFAR datasets. Compared with the Differentiable Neural Computer (DNC) and its variants, our memory model has greater capacity and is significantly easier to train. The Working Memory Network is a Memory Network architecture with a novel working memory storage and relational reasoning module. The model retains the relational reasoning abilities of the Relation Network while reducing its computational complexity considerably. The model achieves state-of-the-art performance in the jointly trained bAbI-10k dataset, with an average error of less than 0.5%. Contextual Memory Trees