This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
attention [2018/09/04 02:43]
attention [2018/10/02 19:57] (current)
Line 303: Line 303:
 https://​arxiv.org/​abs/​1808.03867 Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction https://​arxiv.org/​abs/​1808.03867 Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction
 +https://​arxiv.org/​abs/​1809.11087 Learning to Remember, Forget and Ignore using Attention Control in Memory
 +Applying knowledge gained from psychological studies, we designed a new model called Differentiable Working Memory (DWM) in order to specifically emulate human working memory. As it shows the same functional characteristics as working memory, it robustly learns psychology inspired tasks and converges faster than comparable state-of-the-art models. Moreover, the DWM model successfully generalizes to sequences two orders of magnitude longer than the ones used in training. Our in-depth analysis shows that the behavior of DWM is interpretable and that it learns to have fine control over memory, allowing it to retain, ignore or forget information based on its relevance.
 +https://​openreview.net/​forum?​id=rJxHsjRqFQ Hyperbolic Attention Networks ​
 +By only changing the geometry of embedding of object representations,​ we can use the embedding space more efficiently without increasing the number of parameters of the model. Mainly as the number of objects grows exponentially for any semantic distance from the query, hyperbolic geometry ​ --as opposed to Euclidean geometry-- can encode those objects without having any interference. Our method shows improvements in generalization on neural machine translation on WMT'14 (English to German), learning on graphs (both on synthetic and real-world graph tasks) and visual question answering (CLEVR) tasks while keeping the neural representations compact.