Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
natural_language [2017/12/30 11:07]
admin
natural_language [2018/12/04 14:33] (current)
admin
Line 121: Line 121:
  
 Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections,​ Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available pre-trained models that outperform the current state of the art by a large margin on a number of tasks. Many Natural Language Processing applications nowadays rely on pre-trained word representations estimated from large text corpora such as news collections,​ Wikipedia and Web Crawl. In this paper, we show how to train high-quality word vector representations by using a combination of known tricks that are however rarely used together. The main result of our work is the new set of publicly available pre-trained models that outperform the current state of the art by a large margin on a number of tasks.
 +
 +https://​explosion.ai/​blog/​deep-learning-formula-nlp Embed, encode, attend, predict: The new deep learning formula for state-of-the-art NLP models
 +
 +
 +https://​arxiv.org/​abs/​1706.02596v2 Dynamic Integration of Background Knowledge in Neural NLU Systems
 +
 +Common-sense or background knowledge is required to understand natural language, but in most neural natural language understanding (NLU) systems, the requisite background knowledge is indirectly acquired from static corpora. We develop a new reading architecture for the dynamic integration of explicit background knowledge in NLU models. A new task-agnostic reading module provides refined word representations to a task-specific NLU architecture by processing background knowledge in the form of free-text statements, together with the task-specific inputs. Strong performance on the tasks of document question answering (DQA) and recognizing textual entailment (RTE) demonstrate the effectiveness and flexibility of our approach. Analysis shows that our models learn to exploit knowledge selectively and in a semantically appropriate way.
 +
 +https://​thegradient.pub/​nlp-imagenet/​ NLP's ImageNet moment has arrived
 +
 +https://​arxiv.org/​abs/​1808.03840 Fake Sentence Detection as a Training Task for Sentence Encoding
 +
 +https://​thegradient.pub/​frontiers-of-generalization-in-natural-language-processing/ ​
 +
 +https://​arxiv.org/​pdf/​1609.05284.pdf ReasoNet: Learning to Stop Reading in Machine Comprehension
 +
 +https://​www.microsoft.com/​en-us/​research/​wp-content/​uploads/​2017/​05/​r-net.pdf R-NET: MACHINE READING COMPREHENSION WITH
 +SELF-MATCHING NETWORKS
 +
 +https://​arxiv.org/​abs/​1508.07909 Neural Machine Translation of Rare Words with Subword Units
 +
 +https://​arxiv.org/​abs/​1808.04444 Character-Level Language Modeling with Deeper Self-Attention
 +
 +LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks- 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.
 +
 +https://​github.com/​yingtaomj/​Iterative-Document-Representation-Learning-Towards-Summarization-with-Polishing
 +
 +https://​arxiv.org/​abs/​1810.01480 Optimally Segmenting Inputs for NMT Shows Preference for Character-Level Processing
 +
 + In an evaluation on three translation tasks we found that, given the freedom to navigate between different segmentation levels, the model prefers to operate on (almost) character level, providing support for purely character-level NMT models from a novel angle.
 +
 +https://​arxiv.org/​abs/​1810.04805 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 +
 +http://​phontron.com/​class/​nn4nlp2018/​schedule.html
 +
 +http://​www.aclweb.org/​anthology/​W16-5502 Human-like Natural Language Generation Using
 +Monte Carlo Tree Search
 +
 +https://​arxiv.org/​abs/​1707.05589 On the State of the Art of Evaluation in Neural Language Models
 +
 +https://​arxiv.org/​abs/​1810.08854 pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference
 +
 +http://​papers.nips.cc/​paper/​7408-frage-frequency-agnostic-word-representation FRAGE: Frequency-Agnostic Word Representation
 +