2016

The LAMBADA dataset: Word Prediction Requiring a Broad Discourse Context

D. Paperno, German Kruszewski, A. Lazaridou, N. Q. Pham, R. Bernardi, S. Pezzelle, M. Baroni, G. Boleda, Raquel Fernandez

citations

Cite Score

33

AI summary

This paper introduces the LAMBADA dataset for evaluating language understanding through word prediction, requiring models to track information in the broader discourse; it includes 2662 novels of raw text for training language models, but none of several state-of-the-art language models reaches accuracy above 1%.

Main Contributions

  • Introduces the LAMBADA dataset, a new benchmark for evaluating language models on their ability to understand broad context in text
  • Demonstrates that existing state-of-the-art language models perform poorly on the LAMBADA dataset, achieving less than 1% accuracy
  • Provides an analysis of the LAMBADA dataset, highlighting the linguistic phenomena that make it challenging for language models
  • Shows that LAMBADA requires to capture non-local phenomena for good performance

Abstract

We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task. LAMBADA is a collection of narrative passages sharing the characteristic that human subjects are able to guess their last word if they are exposed to the whole passage, but not if they only see the last sentence preceding the target word. To succeed on LAMBADA, computational models cannot simply rely on local context, but must be able to keep track of information in the broader discourse. We show that LAMBADA exemplifies a wide range of linguistic phenomena, and that none of several state-of-the-art language models reaches accuracy above 1% on this novel benchmark. We thus propose LAMBADA as a challenging test set, meant to encourage the development of new models capable of genuine understanding of broad context in natural language text.

Citation Graph

Loading graph...

References [22]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Jeffrey L. Elman - 1990

23 papers in library cite

Andreas Stolcke - 2002

13 papers in library cite

Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015

25 papers in library cite

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

Yuxuan Zhu, R. Kiros, R. Zemel, Ruslan Salakhutdinov, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

18 papers in library cite

S. Sukhbaatar, A. Szlam, Jason Weston, Rob Fergus - 2015

18 papers in library cite

Oriol Vinyals, Quoc V. Le - 2015

7 papers in library cite

Jason Weston, Antoine Bordes, S. Chopra, Tomas Mikolov - 2015

11 papers in library cite

Tim Rocktaschel, Edward Grefenstette, K. Hermann, T. Kocisky, Phil Blunsom - 2016

5 papers in library cite

M. Richardson, C. J. C. Burges, Erin Renshaw - 2013

16 papers in library cite

F. Hill, Antoine Bordes, S. Chopra, Jason Weston - 2015

14 papers in library cite

A. Sordoni, M. Galley, Michael Auli, Chris Brockett, Yangfeng Ji, M. Mitchell, J. Y. Nie, Jianfeng Gao, B. Dolan - 2015

4 papers in library cite

Tomas Mikolov, Armand Joulin, S. Chopra, M. Mathieu, Marc'aurelio Ranzato - 2015

8 papers in library cite

Geoffrey Zweig, C. J. Burges - 2011

6 papers in library cite

Tianle Wang, Kyunghyun Cho - 2015

4 papers in library cite

Yangfeng Ji, T. Cohn, L. Kong, C. Dyer, J. Eisenstein - 2015

3 papers in library cite

Tomas Mikolov, S. Kombrink, A. Deoras, Lukas Burget, Jan Cernocky - 2011

2 papers in library cite

W. Yin, Hinrich Schutze - 2015

1 paper in library cites

Tomas Mikolov - 2014

1 paper in library cites

Cited by

12

papers in your library

Cites

16

papers in your library

Read

on October 31, 2025

Your review

Tags

Paper Aliases

No aliases