Papperoni

2016

Reasoning About Entailment With Neural Attention

Tim Rocktaschel, Edward Grefenstette, K. Hermann, T. Kocisky, Phil Blunsom

Open PDF Google Scholar

citations

Cite Score

34

AI summary

This paper introduces a neural model using LSTMs and a word-by-word neural attention mechanism for recognizing textual entailment, achieving state-of-the-art accuracy of 83.5% on the SNLI dataset, demonstrating improved reasoning capabilities through qualitative analysis of attention weights.

Main Contributions

Introduces a neural model based on LSTMs for recognizing textual entailment.
Extends the model with a word-by-word neural attention mechanism.
Provides a qualitative analysis of neural attention for RTE.
Achieves an accuracy of 80.9% on SNLI with a benchmark LSTM, outperforming a simple lexicalized classifier.
Sets a new state-of-the-art accuracy of 83.5% for recognizing entailment on SNLI by extending the benchmark LSTM with word-by-word neural attention.

Abstract

While most approaches to automatically recognizing entailment relations have used classifiers employing hand engineered features derived from complex natural language processing pipelines, in practice their performance has been only slightly better than bag-of-word pair classifiers using only lexical similarity. The only attempt so far to build an end-to-end differentiable neural network for entailment failed to outperform such a simple similarity classifier. In this paper, we propose a neural model that reads two sentences to determine entailment using long short-term memory units. We extend this model with a word-by-word neural attention mechanism that encourages reasoning over entailments of pairs of words and phrases. Furthermore, we present a qualitative analysis of attention weights produced by this model, demonstrating such reasoning capabilities. On a large entailment dataset this model outperforms the previous best neural model and a classifier with engineered features by a substantial margin. It is the first generic end-to-end differentiable system that achieves state-of-the-art accuracy on a textual entailment dataset.

Citation Graph

Loading graph...

References [30]

Sort:

Filter:

[1]Adam: A Method for Stochastic Optimization

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Amazing paper! Very well explained and huge impact. I am amazed that they made something so simple even when it requires a lot of background mathematical knowledge

[2]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[3]Distributed Representations of Words and Phrases and Their Compositionality

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Introduced word2vec. Game changer.

[4]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[5]Sequence to Sequence Learning With Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Good paper, but I think it only got famous because they set a new good baseline for NNs in MT. Their main contribution was reversing the source sentence TBH.

[6]Show, Attend and Tell: Neural Image Caption Generation With Visual Attention

K. Xu, Jimmy Lei Ba, R. Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, R. Zemel, Yoshua Bengio - 2015

12 papers in library cite

It's a nice paper. I liked the soft attention way more than the hard one, and I am a bit mad that it wasn't the best lol And also it's the first paper I read about multimodality, but it seems that this was bustling at the time. Also results are kinda bad.

[7]Generating Sequences With Recurrent Neural Networks

Alex Graves - 2013

27 papers in library cite

Very cool and is the first to actually proposed the Attention mechanism! It gets a bit mathy but nothing too crazy. Also has the first examples of good machine generated writing I've seen in these papers, so very nice results.

[8]A Large Annotated Corpus for Learning Natural Language Inference

Samuel R. Bowman, G. Angeli, Christopher Potts, Christopher D. Manning - 2015

25 papers in library cite

Dataset collection is ok. The model that they create seems very low effort.

[9]Recurrent Models of Visual Attention

V. Mnih, N. Heess, Alex Graves - 2014

5 papers in library cite

It's not as good as the other paper (DRAW), but it's a precursor and it's so nice how the model learns to pay attention. Also very nice to see RL in the mix, and see the possible usages in games and other things.

[10]Teaching Machines to Read and Comprehend

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

Nice way of converting unsupervised data to train for Q&A - and nice visualizations as well :) But I think their main contribution is the dataset. Maybe with the dataset they "unlocked" summarization?

[11]Pointer Networks

Oriol Vinyals, M. Fortunato, Navdeep Jaitly - 2015

10 papers in library cite

Cool concept. Nice that it works and can find good solutions for TSP.

[12]Recurrent Neural Network Regularization

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

It's a very simple idea and TBH it's nothing different from dropout. It's good that it's a very short paper and very straightforward, but could be a paragraph long.

[13]A Neural Attention Model for Abstractive Sentence Summarization

Alexander M. Rush, S. Chopra, Jason Weston - 2015

13 papers in library cite

TBH the paper is a bit boring and nothing new after reading a bunch of more modern techniques. I feel that they could have done a better job considering that seq2seq existed at the time. Either way, points for being the first to propose summarization with NNs.

[14]Neural Turing Machines

Alex Graves, G. Wayne, Ivo Danihelka - 2014

18 papers in library cite

This paper is amazing. If someone told me that NNs could use and address memory by position I wouldn't believe it worked. Very nice, but it's a shame that it's just a toy example.

[15]End-to-End Memory Networks

S. Sukhbaatar, A. Szlam, Jason Weston, Rob Fergus - 2015

18 papers in library cite

This was so surprising! This is very similar to transformers and RAG. Who knew?!

[16]The PASCAL Recognising Textual Entailment Challenge

Ido Dagan, O. Glickman, Bernardo Magnini - 2005

19 papers in library cite

It's very nice how they had the foresight to create a challenge that became relevant like 10 years later.

[17]Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

Richard Socher, Eric H. Huang, J. Pennin, C. Manning, A. Ng - 2011

10 papers in library cite

Good paper overall, but seems very simple ideas and complex implementation. Overall not very impactful

[18]Grammar as a Foreign Language

Geoffrey Hinton - 2015

9 papers in library cite

It's a nice paper showing that attention can be used for parsing. However, parsing is boring and is very derivative. Good paper nonetheless.

[19]Inferring Algorithmic Patterns With Stack-Augmented Recurrent Nets

Armand Joulin, Tomas Mikolov - 2015

9 papers in library cite

Very underwhelming TBH. I expected more after reading the Neural Turing Machine paper. This reads like "yeah, we lost the race, here's what we were doing before they did something better"

[20]Framewise Phoneme Classification With Bidirectional LSTM and Other Neural Network Architectures

Alex Graves, Jürgen Schmidhuber - 2005

14 papers in library cite

Very nice paper! Simple, no bullshit. Just "hey, we have LSTM and we have BRNN, let's try to join it"

[21]Convolutional Neural Network Architectures for Matching Natural Language Sentences

B. Hu, Z. L. Lu, H. Li, Qinlang Chen - 2014

2 papers in library cite

Previous work on NLI

[22]Semeval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences Through Semantic Relatedness and Textual Entailment

Marco Marelli, L. Bentivogli, M. Baroni, R. Bernardi, S. Menini, R. Zamparelli - 2014

7 papers in library cite

[23]Illinois-lh: A Denotational and Distributional Approach to Semantics

A. Lai, J. Hockenmaier - 2014

5 papers in library cite

[24]Learning to Transduce With Unbounded Memory

Edward Grefenstette, K. Hermann, M. Suleyman, Phil Blunsom - 2015

5 papers in library cite

[25]Attention-Based Models for Speech Recognition

J. Chorowski, D. Bahdanau, D. Serdyuk, Kyunghyun Cho, Yoshua Bengio - 2015

3 papers in library cite

[26]Convolutional Neural Network for Paraphrase Identification

W. Yin, Hinrich Schutze - 2015

2 papers in library cite

[27]Ecnu: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment

J. Zhao, T. T. Zhu, M. Lan - 2014

2 papers in library cite

[28]Unal-Nlp: Combining Soft Cardinality Features for Semantic Textual Similarity, Relatedness and Entailment

S. Jimenez, G. Duenas, J. Baquero, A. Gelbukh, A. J. D. Batiz, A. Mendiz'abal - 2014

2 papers in library cite

[29]Naturalli: Natural Logic Inference for Common Sense Reasoning

G. Angeli, Christopher D. Manning - 2014

1 paper in library cites

[30]Representing Meaning With a Combination of Logical Form and Vectors

I. Beltagy, S. Roller, P. Cheng, K. Erk, R. J. Mooney - 2015

1 paper in library cites

Cited by

5

papers in your library

Cites

21

papers in your library

Read

on October 23, 2025

It's nice that they are SotA on top of SNLI, but they just apply existing methodologies.

Tags

Paper Aliases

No aliases