Papperoni

2019

Learning and Evaluating General Linguistic Intelligence

D. Yogatama, C. D. M. D'autume, J. Connor, T. Kocisky, M. Chrzanowski, L. Kong, A. Lazaridou, W. Ling, Longhui Yu, C. Dyer

Open PDF Google Scholar

citations

Cite Score

5

AI summary

This paper analyzes state-of-the-art natural language understanding models and conducts an extensive empirical investigation to evaluate them against criteria through a series of experiments. The paper proposes a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task.

Main Contributions

The paper defines general linguistic intelligence as the ability to reuse previously acquired knowledge to adapt to new tasks quickly.
The paper analyzes state-of-the-art natural language understanding models and conducts an extensive empirical investigation to evaluate them.
The paper proposes a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task.
The paper finds that far from solving general tasks, models are overfitting to the quirks of particular datasets (e.g., SQuAD).
The paper discusses missing components and conjecture on how to make progress toward general linguistic intelligence.

Abstract

We define general linguistic intelligence as the ability to reuse previously acquired knowledge about a language's lexicon, syntax, semantics, and pragmatic conventions to adapt to new tasks quickly. Using this definition, we analyze state-of-the-art natural language understanding models and conduct an extensive empirical investigation to evaluate them against these criteria through a series of experiments that assess the task-independence of the knowledge being acquired by the learning process. In addition to task performance, we propose a new evaluation metric based on an online encoding of the test data that quantifies how quickly an existing agent (model) learns a new task. Our results show that while the field has made impressive progress in terms of model architectures that generalize to many tasks, these models still require a lot of in-domain training examples (e.g., for fine tuning, training task-specific modules), and are prone to catastrophic forgetting. Moreover, we find that far from solving general tasks (e.g., document question answering), our models are overfitting to the quirks of particular datasets (e.g., SQuAD). We discuss missing components and conjecture on how to make progress toward general linguistic intelligence.

Citation Graph

Loading graph...

References [35]

Sort:

Filter:

[1]Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

I mean... it introduced Transformers!

[2]BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Simply amazing. It's very impressive how they make a leap vs. existing stuff (you can see from the references, pretty much no one is doing what they are doing, other than GPT)

[3]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[4]Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks

Chelsea Finn, P. Abbeel, Sergey Levine - 2017

4 papers in library cite

Very nice and very well explained. At first it seems that it will be tough to follow, but once they explain it it gets very intuitive.

[5]Deep Contextualized Word Representations

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

I didn't really like the approach. Seems a bit derivative TBH. BERT seems more elegant.

[6]Improving Language Understanding by Generative Pre-Training

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

Very simple and very nice! Easy to understand and revolutionary maybe?

[7]Overcoming Catastrophic Forgetting in Neural Networks

J. Kirkpatrick, Razvan Pascanu, N. C. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. G. Barwinska, Demis Hassabis, C. Clopath, D. Kumaran, Raia Hadsell - 2017

5 papers in library cite

Very nice, intuitive, and impressive results.

[8]SQuAD: 100,000+ Questions for Machine Comprehension of Text

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

Nice paper that introduced an important dataset. Not much else though.

[9]GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018

26 papers in library cite

I like it, but it's just a mesh of different existing datasets and F1 score. Nothing new really but I get why it's important

[10]A Large Annotated Corpus for Learning Natural Language Inference

Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015

25 papers in library cite

Dataset collection is ok. The model that they create seems very low effort.

[11]A Broad-Coverage Challenge Corpus for Sentence Understanding Through Inference

A. Williams, Nikita Nangia, S. Bowman - 2018

19 papers in library cite

Very nice paper and cool dataset - good thing they expanded SNLI. Also, they at least tried to have a good baseline, and comparisons of domains are nice.

[12]Natural Questions: A Benchmark for Question Answering Research

T. Kwiatkowski, J. Palomaki, O. Rhinehart, Michael Collins, A. P. Parikh, C. Alberti, D. Epstein, Illia Polosukhin, M. Kelcey, Jacob Devlin, K. Lee, K. N. Toutanova, Llion Jones, M. W. Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, Slav Petrov - 2019

9 papers in library cite

The dataset and methodology is very nice - it's amazing to see how Google does the summaries in search. However, the paper is too complex with the math stuff - unnecessary.

[13]Catastrophic Forgetting in Connectionist Networks

Robert M. French - 1999

2 papers in library cite

Such a good review/survey on catastrophic forgetting! Very nice

[14]TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

M. Joshi, E. Choi, D. Weld, Luke Zettlemoyer - 2017

18 papers in library cite

I like the way they collect the data, and I think this is a nice dataset. However, it seems like they didn't even try to make a good baseline.

[15]Bidirectional Attention Flow for Machine Comprehension

M. Seo, A. Kembhavi, Ali Farhadi, Hananneh Hajishirzi - 2017

13 papers in library cite

It's alright but the method seems absurdly complex. Maybe I am a bit biased because it's like the 20th paper that I read with attention + LSTMs...

[16]Adversarial Examples for Evaluating Reading Comprehension Systems

R. Jia, Percy Liang - 2017

11 papers in library cite

I liked it a lot! It's good to see people testing things rather than just trying to beat SotA!

[17]Semi-Supervised Sequence Learning

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Very good paper that was probably the first to introduce pre-training in NLP!

[18]The Natural Language Decathlon: Multitask Learning as Question Answering

Richard Socher - 2018

9 papers in library cite

Very nice results, but TBH the architecture is waaaay to complicated

[19]Improving Neural Language Models With a Continuous Cache

E. Grave, Armand Joulin, Nicolas Usunier - 2016

7 papers in library cite

This was a surprise to me - I expected this to suck. However, they can provide a simple and intuitive way of improving LMs - nice

[20]Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem

M. Mccloskey, N. J. Cohen - 1989

4 papers in library cite

50+ pages, but the first to notice catastropgic forgetting

[21]Progress and Compress: A Scalable Framework for Continual Learning

J. Schwarz, Jelena Luketina, W. M. Czarnecki, A. G. Barwinska, Yee Whye Teh, Razvan Pascanu, Raia Hadsell - 2018

1 paper in library cites

[22]Quac: Question Answering in Context

E. Choi, He He, M. Iyyer, M. Yatskar, W. T. Yih, Yejin Choi, Percy Liang, Luke Zettlemoyer - 2018

8 papers in library cite

[23]Zero-Shot Relation Extraction via Reading Comprehension

Omer Levy, M. Seo, E. Choi, L. S. Zettlemoyer - 2017

3 papers in library cite

Seems nice - they answer relations that were never seen in the dataset

[24]Dynamic Evaluation of Neural Sequence Models

B. Krause, E. Kahembwe, I. Murray, S. Renals - 2017

3 papers in library cite

Greg Brockman, V. Cheung, L. Pettersson, J. Schneider, John Schulman, Jie Tang, Wojciech Zaremba - 2016

3 papers in library cite

[26]Achieving Human Parity on Automatic chinese to English News Translation

H. Hassan, A. Aue, C. C. Chen, V. Chowdhary, Jack Clark, C. Federmann, X. Huang, M. J. Dowmunt, W. Lewis, M. Li, Shuming Liu, T. Y. Liu, R. Luo, Arul Menezes, T. Qin, F. Seide, X. Tan, F. Tian, L. Wu, S. Wu, Y. Xia, Danyang Zhang, Zhengyou Zhang, M. Zhou - 2018

1 paper in library cites

[27]Catching Up Faster by Switching Sooner: A Predictive Approach to Adaptive Estimation With an Application to the AIC-BIC Dilemma

T. V. Erven, P. Grunwald, S. D. Rooij - 2012

1 paper in library cites

[28]DeepMind Lab

C. Beattie, J. Z. Leibo, D. Teplyashin, T. Ward, M. Wainwright, H. Kuttler, A. Lefrancq, S. Green, V. Valdes, A. Sadik, J. Schrittwieser, K. Anderson, S. York, M. Cant, A. Cain, A. Bolton, S. Gaffney, H. King, Demis Hassabis, Shane Legg, S. Petersen - 2016

1 paper in library cites

[29]Language Acquisition, Data Compression and Generalization

J. G. Wolff - 1982

1 paper in library cites

[30]Large-Scale QA-SRL parsing

N. Fitzgerald, J. Michael, Luheng He, Luke Zettlemoyer - 2018

1 paper in library cites

[31]On First-Order Meta-Learning Algorithms

Alex Nichol, Josh Achiam, John Schulman - 2018

1 paper in library cites

[32]On the Intelligibility of the Universe and the Notions of Simplicity, Complexity and Irreducibility

G. J. Chaitin - 2007

1 paper in library cites

[33]Population Based Training of Neural Networks

M. Jaderberg, V. Dalibard, S. Osindero, W. M. Czarnecki, J. Donahue, A. Razavi, Oriol Vinyals, T. Green, I. Dunning, K. Simonyan, C. Fernando, Koray Kavukcuoglu - 2017

1 paper in library cites

[34]Results of the Active Learning Challenge

I. Guyon, G. Cawley, G. Dror, V. Lemaire - 2011

1 paper in library cites

[35]The Description Length of Deep Learning Models

L. Blier, Y. Ollivier - 2018

1 paper in library cites

Cited by

2

papers in your library

Cites

23

papers in your library

Read

on November 13, 2025

It's a nice contribution and good analysis on generalization, but TBH I expected a bit more given the title.

Tags

Paper Aliases

No aliases