2015

The Goldilocks Principle: Reading Children's Books With Explicit Memory Representations

F. Hill, Antoine Bordes, S. Chopra, Jason Weston

citations

Cite Score

29

AI summary

This paper introduces the Children's Book Test (CBT) and explores memory networks for language modeling. It finds that models with explicit memory perform well on semantic content words, and self-supervision enhances performance, achieving state-of-the-art results on the CNN QA benchmark.

Main Contributions

  • Introduces the Children's Book Test (CBT) dataset for evaluating language models on children's books.
  • Compares various state-of-the-art language models, including RNNs and Memory Networks, on the CBT dataset.
  • Shows that Memory Networks with explicit memory representations outperform other models in predicting semantic content words.
  • Finds that there is a 'Goldilocks principle' for the amount of text encoded in a single memory representation.
  • Achieves state-of-the-art performance on the CNN QA benchmark by applying self-supervision to Memory Networks.

Abstract

We introduce a new test of how well language models capture meaning in children's books. Unlike standard language modelling benchmarks, it distinguishes the task of predicting syntactic function words from that of predicting lower-frequency words, which carry greater semantic content. We compare a range of state-of-the-art models, each with a different way of encoding what has been previously read. We show that models which store explicit representations of long-term contexts outperform state-of-the-art neural language models at predicting semantic content words, although this advantage is not observed for syntactic function words. Interestingly, we find that the amount of text encoded in a single memory representation is highly influential to the performance: there is a sweet-spot, not too big and not too small, between single words and full sentences that allows the most meaningful information in a text to be effectively retained and recalled. Further, the attention over such window-based memories can be trained effectively through self-supervision. We then assess the generality of this principle by applying it to the CNN QA benchmark, which involves identifying named entities in paraphrased summaries of news articles, and achieve state-of-the-art performance.

Citation Graph

Loading graph...

References [29]

Sort:
Filter:

R. Williams - 1992

11 papers in library cite

K. Xu, Jimmy Lei Ba, R. Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, R. Zemel, Yoshua Bengio - 2015

12 papers in library cite

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

Geoffrey E. Hinton, N. Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2012

25 papers in library cite

T. Luong, H. Pham, Christopher D. Manning - 2015

15 papers in library cite

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

Alexander M. Rush, S. Chopra, Jason Weston - 2015

13 papers in library cite

L. Wan, M. Zeiler, S. Zhang, Rob Fergus - 2013

8 papers in library cite

S. Sukhbaatar, A. Szlam, Jason Weston, Rob Fergus - 2015

18 papers in library cite

Jason Weston, S. Chopra, Antoine Bordes - 2015

18 papers in library cite

Eric H. Huang, Richard Socher, C. Manning, Andrew Y. Ng - 2012

7 papers in library cite

Jason Weston, Antoine Bordes, S. Chopra, Tomas Mikolov - 2015

11 papers in library cite

M. Richardson, C. J. C. Burges, Erin Renshaw - 2013

16 papers in library cite

Tomas Mikolov, Geoffrey Zweig - 2012

12 papers in library cite

Jason Weston, Samy Bengio, Nicolas Usunier - 2010

3 papers in library cite

Armand Joulin, Tomas Mikolov - 2015

9 papers in library cite

A. Kumar, O. Irsoy, P. Ondruska, M. Iyyer, J. Bradbury, I. Gulrajani, Victor Zhong, R. Paulus, Richard Socher - 2015

9 papers in library cite

Antoine Bordes, Nicolas Usunier, S. Chopra, Jason Weston - 2015

5 papers in library cite

R. Kuhn, R. D. Mori - 1990

6 papers in library cite

Geoffrey Zweig, C. J. Burges - 2011

6 papers in library cite

Christopher D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, D. Mcclosky - 2014

6 papers in library cite

Edward Grefenstette, K. Hermann, M. Suleyman, Phil Blunsom - 2015

5 papers in library cite

Alex Graves, Santiago Fernandez, M. Liwicki, H. Bunke, Jürgen Schmidhuber - 2008

5 papers in library cite

K. Heafield, I. Pouzyrevsky, J. H. Clark, P. Koehn - 2013

2 papers in library cite

C. Dyer, M. Ballesteros, W. Ling, A. Matthews, N. Smith - 2015

2 papers in library cite

J. Hassall - 1904

1 paper in library cites

G. Altmann, M. Steedman - 1988

1 paper in library cites

J. R. Binder, R. H. Desai - 2011

1 paper in library cites

R. H. Baayen, R. Lieber - 1996

1 paper in library cites

Cited by

14

papers in your library

Cites

18

papers in your library

Read

on October 30, 2025

Your review

Tags

Paper Aliases

No aliases