Papperoni

2017

A Deep Reinforced Model for Abstractive Summarization

R. Paulus, Caiming Xiong, Richard Socher

Open PDF Google Scholar

citations

Cite Score

55

AI summary

This paper introduces an abstractive summarization model that uses a novel intra-attention mechanism and reinforcement learning to address the repeating phrase problem, achieving state-of-the-art ROUGE-1 score of 41.16 on the CNN/Daily Mail dataset and good results on the New York Times dataset.

Main Contributions

Introduces a novel intra-attention mechanism that attends over the input and continuously generated output separately.
Proposes a new training method that combines supervised word prediction and reinforcement learning (RL).
Achieves a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, surpassing previous state-of-the-art models.
Demonstrates through human evaluation that the model produces higher quality summaries.
Presents the first end-to-end model for abstractive summarization on the NYT dataset.

Abstract

Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summaries however these models often include repetitive and incoherent phrases. We introduce a neural network model with a novel intra attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL). Models trained only with supervised learning often exhibit “exposure bias” – they assume ground truth is provided at each step during training. However, when standard word prediction is combined with the global sequence prediction training of RL the resulting summaries become more readable. We evaluate this model on the CNN/Daily Mail and New York Times datasets. Our model obtains a 41.16 ROUGE-1 score on the CNN/Daily Mail dataset, an improvement over previous state-of-the-art models. Human evaluation also shows that our model produces higher quality summaries.

Citation Graph

Loading graph...

References [39]

Sort:

Filter:

[1]Adam: A Method for Stochastic Optimization

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Amazing paper! Very well explained and huge impact. I am amazed that they made something so simple even when it requires a lot of background mathematical knowledge

[2]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[3]Distributed Representations of Words and Phrases and Their Compositionality

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Introduced word2vec. Game changer.

[4]GloVe: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, Christopher D. Manning - 2014

31 papers in library cite

Not a bad paper, I just don't like the motivation and I think the methodology is poorly explained and hard to follow. I can't deny the good results though...

[5]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[6]Sequence to Sequence Learning With Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Good paper, but I think it only got famous because they set a new good baseline for NNs in MT. Their main contribution was reversing the source sentence TBH.

[7]ROUGE: A Package for Automatic Evaluation of Summaries

Chin Yew Lin - 2004

9 papers in library cite

Wow, very poorly written and badly explained. This is the exact opposite of BLEU

[8]Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning

R. Williams - 1992

11 papers in library cite

It's alright for formalizing the concept, but it's a bit boring and doesn't add a lot from the middle on. Focuses too much in reviewing existing techniques and in stochastic units.

[9]Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation

Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016

15 papers in library cite

It's a very good paper but TBH doesn't bring anything new other than joining a bunch of existing stuff. I think it ended up being foundational because it's Google and several people used it as a base for future research. Good contribution then :)

[10]Teaching Machines to Read and Comprehend

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

Nice way of converting unsupervised data to train for Q&A - and nice visualizations as well :) But I think their main contribution is the dataset. Maybe with the dataset they "unlocked" summarization?

[11]Pointer Networks

Oriol Vinyals, M. Fortunato, Navdeep Jaitly - 2015

10 papers in library cite

Cool concept. Nice that it works and can find good solutions for TSP.

[12]Get to the Point: Summarization With Pointer-Generator Networks

A. See, P. J. Liu, Christopher D. Manning - 2017

8 papers in library cite

It's a bit of the same thing of the other ones. I am not sure if this was the first or not, but I am getting a bit bored of this

[13]A Neural Attention Model for Abstractive Sentence Summarization

Alexander M. Rush, S. Chopra, Jason Weston - 2015

13 papers in library cite

TBH the paper is a bit boring and nothing new after reading a bunch of more modern techniques. I feel that they could have done a better job considering that seq2seq existed at the time. Either way, points for being the first to propose summarization with NNs.

[14]Pointer Sentinel Mixture Models

S. Merity, Caiming Xiong, J. Bradbury, Richard Socher - 2017

12 papers in library cite

I really liked the methodology, but I had to read it a few times to understand it intuitively - I think they should have done a better job at explaining it.

[15]Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond

R. Nallapati, B. Zhou, C. N. D. Santos, C. G. Gulcehre, Bing Xiang - 2016

10 papers in library cite

Cool that they use pointer switching, and they introduce the CNN/daily mail dataset for summarization, which is a nice insight.

[16]Long Short-Term Memory-Networks for Machine Reading

Mirella Lapata - 2016

8 papers in library cite

I read this more as an example of intra-attention, but this is not the main focus of the paper. I think visualization/explanation is a bit bad, and it doesn't seem too impactful. I kept thinking that this is starting to get too complicated, and indeed it was surpassed by transformers right after that.

[17]Using the Output Embedding to Improve Language Models

O. Press, Lior Wolf - 2017

7 papers in library cite

I did not like this paper at all - The paper is not bad, it's just that I expected *way* more. Good results but uninteresting

[18]A Learning algorithm for Continually Running Fully Recurrent Neural Networks

R. Williams, David Zipser - 1989

8 papers in library cite

Quick read and very simple concept. I wish they explained things using a more visual approach instead of maths, but it's ok to follow nonetheless.

[19]Self-Critical Sequence Training for Image Captioning

S. Rennie, E. Marcheret, Y. Mroueh, J. Ross, V. Goel - 2016

1 paper in library cites

[20]Sequence Level Training With Recurrent Neural Networks

Marc'aurelio Ranzato, S. Chopra, Michael Auli, Wojciech Zaremba - 2015

6 papers in library cite

Exposure bias

[21]How Not to Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation

C. L. Liu, Ryan Lowe, I. Serban, M. Noseworthy, L. Charlin, J. Pineau - 2016

3 papers in library cite

They discuss how to game metrics

[22]Abstractive Sentence Summarization With Attentive Recurrent Neural Networks

S. Chopra, Michael Auli, A. Rush, S. Harvard - 2016

5 papers in library cite

Summarization with attention

[23]Pointing the Unknown Words

C. G. Gulcehre, S. Ahn, R. Nallapati, B. Zhou, Yoshua Bengio - 2016

7 papers in library cite

Bengio, pointer networks

[24]Reward Augmented Maximum Likelihood for Neural Structured Prediction

M. Norouzi, Samy Bengio, Navdeep Jaitly, M. Schuster, Yonghui Wu, Dale Schuurmans - 2016

2 papers in library cite

RL + NNs?

[25]Temporal Attention Model for Neural Machine Translation

B. Sankaran, H. Mi, Y. A. Onaizan, A. Ittycheriah - 2016

3 papers in library cite

Intra-attention

[26]The stanford coreNLP Natural Language Processing Toolkit

Christopher D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, D. Mcclosky - 2014

6 papers in library cite

[27]Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling

H. Inan, K. Khosravi, Richard Socher - 2017

6 papers in library cite

[28]Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation

B. Dorr, D. Zajic, Richard Schwartz - 2003

3 papers in library cite

[29]The new york times annotated corpus

E. Sandhaus - 2008

3 papers in library cite

[30]Distraction-Based Neural Networks for Modeling Documents

Qinlang Chen, X. Zhu, Z. Ling, S. Wei, H. Jiang - 2016

2 papers in library cite

[31]Efficient Summarization With Read-Again and Copy Mechanism

W. Zeng, W. Luo, Sanja Fidler, R. Urtasun - 2016

2 papers in library cite

[32]Summarunner: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents

R. Nallapati, F. Zhai, B. Zhou - 2017

2 papers in library cite

[33]Detecting Information-Dense Texts in Multiple News Domains

Yining Yang, A. Nenkova - 2014

1 paper in library cites

[34]Identification and Characterization of Newsworthy Verbs in world News

B. Nye, A. Nenkova - 2015

1 paper in library cites

[35]Improving Multi-Step Prediction of Learned Time Series Models

A. Venkatraman, M. Hebert, J. Bagnell - 2015

1 paper in library cites

[36]Improving the Estimation of Word Importance for News Multi-Document Summarization-Extended Technical Report

K. Hong, A. Nenkova - 2014

1 paper in library cites

[37]Learning-Based Single-Document Summarization With Compression and Anaphoricity Constraints

G. Durrett, T. B. Kirkpatrick, Dan Klein - 2016

1 paper in library cites

[38]System Combination for Multi-Document Summarization

K. Hong, M. Marcus, A. Nenkova - 2015

1 paper in library cites

[39]The Role of Discourse Units in Near-Extractive Summarization

Jeffrey Li, K. Thadani, A. Stent - 2016

1 paper in library cites

Cited by

7

papers in your library

Cites

25

papers in your library

Read

on August 11, 2025

It's nice that they introduce intra-attention and RL, but at this point I think a lot of the work in attention is very derivative.

Tags

Paper Aliases

No aliases