2017

Pointer Sentinel Mixture Models

S. Merity, Caiming Xiong, J. Bradbury, Richard Socher

citations

Cite Score

66

AI summary

This paper introduces a pointer sentinel mixture model with pointer-LSTM that achieves state-of-the-art language modeling performance on the Penn Treebank with fewer parameters. The paper also introduces a new benchmark dataset for language modeling called WikiText.

Main Contributions

  • Introduces a pointer sentinel mixture model with pointer-LSTM for language modeling.
  • The pointer sentinel-LSTM model achieves state-of-the-art language modeling performance on the Penn Treebank with 70.9 perplexity.
  • The pointer sentinel-LSTM model uses fewer parameters than a standard softmax LSTM.
  • Introduces a new benchmark dataset for language modeling called WikiText.
  • The pointer component is heavily used for rare names.

Abstract

Recent neural network sequence models with softmax classifiers have achieved their best language modeling performance only with very large hidden states and large vocabularies. Even then they struggle to predict rare or unseen words even if the context makes the prediction unambiguous. We introduce the pointer sentinel mixture architecture for neural sequence models which has the ability to either reproduce a word from the recent context or produce a word from a standard softmax classifier. Our pointer sentinel-LSTM model achieves state of the art language modeling performance on the Penn Treebank (70.9 perplexity) while using far fewer parameters than a standard softmax LSTM. In order to evaluate how well language models can exploit longer contexts and deal with more realistic vocabularies and larger corpora we also introduce the freely available WikiText corpus.

Citation Graph

Loading graph...

References [27]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993

22 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Oriol Vinyals, M. Fortunato, Navdeep Jaitly - 2015

10 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

S. Sukhbaatar, A. Szlam, Jason Weston, Rob Fergus - 2015

18 papers in library cite

Yarin Gal - 2015

9 papers in library cite

Mirella Lapata - 2016

8 papers in library cite

Razvan Pascanu, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2013

7 papers in library cite

C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013

13 papers in library cite

Tomas Mikolov, Geoffrey Zweig - 2012

12 papers in library cite

A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, Victor Zhong, R. Paulus, Richard Socher - 2015

9 papers in library cite

C. G. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, Yoshua Bengio - 2016

7 papers in library cite

J. G. Zilly, R. K. Srivastava, J. Koutnik, Jürgen Schmidhuber - 2016

6 papers in library cite

W. Ling, Edward Grefenstette, K. M. Hermann, T. Kocisky, A. Senior, Feng Wang, Phil Blunsom - 2016

3 papers in library cite

David Krueger, T. Maharaj, J. Kramar, M. Pezeshki, Nicolas Ballas, N. R. Ke, A. G. A. P. Goyal, Yoshua Bengio, Hugo Larochelle, Aaron Courville - 2016

3 papers in library cite

R. Kadlec, M. Schmid, O. Bajgar, Jan Kleindienst - 2016

7 papers in library cite

P. Koehn, H. Hoang, Alexandra Birch, Chris Callison Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, E. Herbst - 2007

8 papers in library cite

Yoon Kim, Yacine Jernite, D. Sontag, Alexander M. Rush - 2016

7 papers in library cite

R. Rosenfeld - 1996

6 papers in library cite

Caiming Xiong, S. Merity, Richard Socher - 2016

5 papers in library cite

Y. Adi, E. Kermany, Yonatan Belinkov, O. Lavi, Y. Goldberg - 2016

4 papers in library cite

J. Gu, Z. L. Lu, H. Li, V. O. K. Li - 2016

4 papers in library cite

W. C. Cheng, S. Kok, H. V. Pham, H. L. Chieu, K. M. A. Chai - 2014

2 papers in library cite

S. Ahn, H. Choi, T. Parnamaa, Yoshua Bengio - 2016

1 paper in library cites

Cited by

12

papers in your library

Cites

19

papers in your library

Read

on November 2, 2025

Your review

Tags

Paper Aliases

No aliases