2018
Cite Score
19
AI summary
This paper analyzes how LSTM language models use context through ablation studies on Penn Treebank and WikiText-2. The model uses about 200 tokens of context, distinguishing nearby context from distant history, and the neural caching model helps the LSTM copy words from distant context.
Main Contributions
Abstract
We know very little about how neural language models (LM) use prior linguistic context. In this paper, we investigate the role of context in an LSTM LM, through ablation studies. Specifically, we analyze the increase in perplexity when prior context words are shuffled, replaced, or dropped. On two standard datasets, Penn Treebank and WikiText-2, we find that the model is capable of using about 200 tokens of context on average, but sharply distinguishes nearby context (recent 50 tokens) from the distant history. The model is highly sensitive to the order of words within the most recent sentence, but ignores word order in the long-range context (beyond 50 tokens), suggesting the distant past is modeled only as a rough semantic field or topic. We further find that the neural caching model (Grave et al., 2017b) especially helps the LSTM to copy words from within this distant context. Overall, our analysis not only provides a better understanding of how neural LMs use their context, but also sheds light on recent success from cache-based models.
Citation Graph
References [27]
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993
22 papers in library cite
Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010
36 papers in library cite
Alex Graves - 2013
27 papers in library cite
L. Wan, M. Zeiler, S. Zhang, Rob Fergus - 2013
8 papers in library cite
S. Merity, Caiming Xiong, J. Bradbury, Richard Socher - 2017
12 papers in library cite
Yarin Gal - 2015
9 papers in library cite
R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016
20 papers in library cite
O. Press, Lior Wolf - 2017
7 papers in library cite
F. Hill, Antoine Bordes, S. Chopra, Jason Weston - 2015
14 papers in library cite
E. Grave, Armand Joulin, Nicolas Usunier - 2016
7 papers in library cite
Yann N. Dauphin, A. Fan, Michael Auli, D. Grangier - 2016
8 papers in library cite
S. Merity, Nitish Shirish Keskar, Richard Socher - 2017
6 papers in library cite
G. Melis, C. Dyer, Phil Blunsom - 2018
6 papers in library cite
Christopher D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, D. Mcclosky - 2014
6 papers in library cite
H. Inan, K. Khosravi, Richard Socher - 2017
6 papers in library cite
Tal Linzen, E. Dupoux, Y. Goldberg - 2016
5 papers in library cite
Zhilin Yang, Z. Dai, Ruslan Salakhutdinov, W. W. Cohen - 2017
4 papers in library cite
Y. Adi, E. Kermany, Yonatan Belinkov, O. Lavi, Y. Goldberg - 2016
4 papers in library cite
Sayan Ghosh, Oriol Vinyals, B. Strope, S. Roy, T. Dean, L. Heck - 2016
1 paper in library cites
Tianle Wang, Kyunghyun Cho - 2016
1 paper in library cites
C. Chelba, M. Norouzi, Samy Bengio - 2017
1 paper in library cites
J. B. Graber, D. Blei - 2009
1 paper in library cites
J. H. Lau, T. Baldwin, T. Cohn - 2017
1 paper in library cites
E. Grave, M. M. Cisse, Armand Joulin - 2017
1 paper in library cites
Jeffrey Li, X. Chen, Eduard Hovy, Dan Jurafsky - 2016
1 paper in library cites
Cited by
2
papers in your library
Cites
15
papers in your library
Read
on November 16, 2025
Your review
Tags
Paper Aliases
No aliases