2017
Cite Score
66
AI summary
This paper introduces a pointer sentinel mixture model with pointer-LSTM that achieves state-of-the-art language modeling performance on the Penn Treebank with fewer parameters. The paper also introduces a new benchmark dataset for language modeling called WikiText.
Main Contributions
Abstract
Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.
Citation Graph
References [27]
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993
22 papers in library cite
Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013
21 papers in library cite
Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010
36 papers in library cite
Oriol Vinyals, M. Fortunato, Navdeep Jaitly - 2015
10 papers in library cite
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014
22 papers in library cite
S. Sukhbaatar, A. Szlam, Jason Weston, Rob Fergus - 2015
18 papers in library cite
Yarin Gal - 2015
9 papers in library cite
Mirella Lapata - 2016
8 papers in library cite
Razvan Pascanu, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2013
7 papers in library cite
C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013
13 papers in library cite
Tomas Mikolov, Geoffrey Zweig - 2012
12 papers in library cite
A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, Victor Zhong, R. Paulus, Richard Socher - 2015
9 papers in library cite
C. G. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, Yoshua Bengio - 2016
7 papers in library cite
J. G. Zilly, R. K. Srivastava, J. Koutnik, Jürgen Schmidhuber - 2016
6 papers in library cite
W. Ling, Edward Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, Feng Wang, Phil Blunsom - 2016
3 papers in library cite
David Krueger, T. Maharaj, J. Kramar, M. Pezeshki, Nicolas Ballas, N. R. Ke, A. G. A. P. Goyal, Yoshua Bengio, Hugo Larochelle, Aaron Courville - 2016
3 papers in library cite
R. Kadlec, M. Schmid, O. Bajgar, Jan Kleindienst - 2016
7 papers in library cite
P. Koehn, H. Hoang, Alexandra Birch, Chris Callison Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, E. Herbst - 2007
8 papers in library cite
Yoon Kim, Yacine Jernite, D. Sontag, Alexander M. Rush - 2016
7 papers in library cite
R. Rosenfeld - 1996
6 papers in library cite
Caiming Xiong, S. Merity, Richard Socher - 2016
5 papers in library cite
Y. Adi, E. Kermany, Yonatan Belinkov, O. Lavi, Y. Goldberg - 2016
4 papers in library cite
J. Gu, Z. L. Lu, H. Li, V. O. K. Li - 2016
4 papers in library cite
W. C. Cheng, S. Kok, H. V. Pham, H. L. Chieu, K. M. A. Chai - 2014
2 papers in library cite
S. Ahn, H. Choi, T. Parnamaa, Yoshua Bengio - 2016
1 paper in library cites
Cited by
12
papers in your library
Cites
19
papers in your library
Read
on November 2, 2025
Your review
Tags
Paper Aliases
No aliases