Papperoni

2019

BART: Denoising Sequence-to-Sequence Pre-Training for Natural Language Generation, Translation, and Comprehension

Martha Lewis, Yibo Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, Omer Levy, Veselin Stoyanov, Luke Zettlemoyer

Open PDF Google Scholar

citations

Cite Score

89

AI summary

This paper introduces BART, a denoising autoencoder using a sequence-to-sequence Transformer model for pre-training. It uses text infilling and sentence permutation as noising functions. BART achieves state-of-the-art results on abstractive dialogue, question answering, and summarization tasks, improving ROUGE scores.

Main Contributions

Introduces BART, a denoising autoencoder for pre-training sequence-to-sequence models.
Demonstrates that BART generalizes BERT and GPT.
Finds the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token.
Achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE.
Provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining.

Abstract

We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text. It uses a standard Tranformer-based neural machine translation architecture which, despite its simplicity, can be seen as generalizing BERT (due to the bidirectional encoder), GPT (with the left-to-right decoder), and many other more recent pretraining schemes. We evaluate a number of noising approaches, finding the best performance by both randomly shuffling the order of the original sentences and using a novel in-filling scheme, where spans of text are replaced with a single mask token. BART is particularly effective when fine tuned for text generation but also works well for comprehension tasks. It matches the performance of ROBERTa with comparable training resources on GLUE and SQUAD, achieves new state-of-the-art results on a range of abstractive dialogue, question answering, and summarization tasks, with gains of up to 6 ROUGE. BART also provides a 1.1 BLEU increase over a back-translation system for machine translation, with only target language pretraining. We also report ablation experiments that replicate other pretraining schemes within the BART framework, to better measure which factors most influence end-task performance.

Citation Graph

Loading graph...

References [33]

Sort:

Filter:

[1]Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

I mean... it introduced Transformers!

[2]BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Simply amazing. It's very impressive how they make a leap vs. existing stuff (you can see from the references, pretty much no one is doing what they are doing, other than GPT)

[3]Efficient Estimation of Word Representations in Vector Space

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

Expanded wor2vec. Very nice overall.

[4]RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yibo Liu, M. Ott, N. Goyal, J. Du, M. Joshi, Deli Chen, Omer Levy, Martha Lewis, Luke Zettlemoyer, Veselin Stoyanov - 2019

17 papers in library cite

I liked it a lot! It shows that you don't need to do something completely new to have good results and contribute to science. It could be a 5, but it's a 4 due to not bringing anything new

[5]Language Models Are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019

27 papers in library cite

Amazing! Tons of important contributions. I think they could have explained the models a bit better, and I think this is where OpenAI starts to become evil (and not open)

[6]Deep Contextualized Word Representations

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

I didn't really like the approach. Seems a bit derivative TBH. BERT seems more elegant.

[7]Improving Language Understanding by Generative Pre-Training

Alec Radford, K. Narasimhan, T. Salimans, Ilya Sutskever - 2018

23 papers in library cite

Very simple and very nice! Easy to understand and revolutionary maybe?

[8]XLNet: Generalized Autoregressive Pretraining for Language Understanding

Zhilin Yang, Z. Dai, Yining Yang, J. Carbonell, Ruslan Salakhutdinov, Quoc V. Le - 2019

11 papers in library cite

The method is nice and the results are very good but this paper is just soooo hard to follow...

[9]Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

Richard Socher, A. Perelygin, Jeffrey Wu, J. Chuang, C. Manning, A. Ng, Christopher Potts - 2013

24 papers in library cite

I didn't really like the first paper and I don't really like this one. I think the dataset is more influential than the methodology. I think Stanford folks are too focused on old school NLP.

[10]SQuAD: 100,000+ Questions for Machine Comprehension of Text

P. Rajpurkar, J. Zhang, K. Lopyrev, Percy Liang - 2016

37 papers in library cite

Nice paper that introduced an important dataset. Not much else though.

[11]ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations

Z. Lan, Mark Chen, S. Goodman, Kevin Gimpel, P. Sharma, Radu Soricut - 2019

8 papers in library cite

I like how they can achieve very close results with very few params! Very nice tricks to do that as well.

[12]Gaussian Error Linear Units (Gelus)

Dan Hendrycks, Kevin Gimpel - 2016

5 papers in library cite

Very understandable, and very nice! I don't think the justification is good, but hey, it works!

[13]GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

A. Wang, A. Singh, J. Michael, F. Hill, Omer Levy, Samuel R. Bowman - 2018

26 papers in library cite

I like it, but it's just a mesh of different existing datasets and F1 score. Nothing new really but I get why it's important

[14]A Broad-Coverage Challenge Corpus for Sentence Understanding Through Inference

A. Williams, Nikita Nangia, S. Bowman - 2018

19 papers in library cite

Very nice paper and cool dataset - good thing they expanded SNLI. Also, they at least tried to have a good baseline, and comparisons of domains are nice.

[15]Teaching Machines to Read and Comprehend

K. M. Hermann, T. Kocisky, Edward Grefenstette, L. Espeholt, W. Kay, M. Suleyman, Phil Blunsom - 2015

31 papers in library cite

Nice way of converting unsupervised data to train for Q&A - and nice visualizations as well :) But I think their main contribution is the dataset. Maybe with the dataset they "unlocked" summarization?

[16]Get to the Point: Summarization With Pointer-Generator Networks

A. See, P. J. Liu, Christopher D. Manning - 2017

8 papers in library cite

It's a bit of the same thing of the other ones. I am not sure if this was the first or not, but I am getting a bit bored of this

[17]The PASCAL Recognising Textual Entailment Challenge

Ido Dagan, O. Glickman, Bernardo Magnini - 2005

19 papers in library cite

It's very nice how they had the foresight to create a challenge that became relevant like 10 years later.

[18]SpanBERT: Improving Pre-Training by Representing and Predicting Spans

M. Joshi, Deli Chen, Yibo Liu, D. Weld, Luke Zettlemoyer, Omer Levy - 2019

5 papers in library cite

Yet another BERT, and it's not surprising since I've seen this in other papers. But is good nonetheless.

[19]Automatically Constructing a Corpus of Sentential Paraphrases

W. Dolan, Chris Brockett - 2005

9 papers in library cite

Small dataset, questionable methodology, not useful for training models

[20]The Winograd Schema Challenge

Hector J. Levesque, E. Davis, Leora Morgenstern - 2011

13 papers in library cite

It's amazing to see a paper that is very easy to read, opinionated, but also introducing an important contribution to the field. I love that they introduced this challenge as an alternative to the Imitation Game.

[21]Cross-Lingual Language Model Pretraining

G. Lample, Alexis Conneau - 2019

5 papers in library cite

I like how they learn cross-lingual stuff via vocabulary sharing. it's probably the start of the multi-language LMs.

[22]Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

Shashi Narayan, S. B. Cohen, Mirella Lapata - 2018

3 papers in library cite

[23]Unified Language Model Pre-Training for Natural Language Understanding and Generation

L. Dong, N. Yang, Wenyi Wang, F. Wei, Xiaodong Liu, Yuzhi Wang, Jianfeng Gao, M. Zhou, H. W. Hon - 2019

4 papers in library cite

[24]Neural Network Acceptability Judgments

Alex Warstadt, A. Singh, S. Bowman - 2018

8 papers in library cite

CoLA dataset

[25]Mass: Masked Sequence to Sequence Pre-Training for Language Generation

K. Song, X. Tan, T. Qin, J. Lu, T. Y. Liu - 2019

5 papers in library cite

[26]ELI5: Long Form Question Answering

A. Fan, Yacine Jernite, Ethan Perez, D. Grangier, Jason Weston, Michael Auli - 2019

4 papers in library cite

[27]Controllable Abstractive Summarization

A. Fan, D. Grangier, Michael Auli - 2017

2 papers in library cite

[28]Edinburgh neural Machine Translation Systems for WMT 16

R. Sennrich, B. Haddow, Alexandra Birch - 2016

5 papers in library cite

[29]Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval 2007)

E. Agirre, L. M'arquez, R. Wicentowski - 2007

2 papers in library cite

[30]Pre-Trained Language Model Representations for Language Generation

S. Edunov, A. Baevski, Michael Auli - 2019

1 paper in library cites

[31]Regularizing Neural Networks by Penalizing Confident Output Distributions

G. Pereyra, G. Tucker, J. Chorowski, Lukasz Kaiser, Geoffrey Hinton - 2017

1 paper in library cites

[32]Text Summarization With Pretrained Encoders

Yibo Liu, Mirella Lapata - 2019

1 paper in library cites

[33]The Second Conversational Intelligence Challenge (convai2)

E. Dinan, V. Logacheva, V. Malykh, A. Miller, K. Shuster, J. Urbanek, Douwe Kiela, A. Szlam, I. Serban, Ryan Lowe - 2019

1 paper in library cites

Cited by

6

papers in your library

Cites

27

papers in your library

Read

on November 14, 2025

I expected SO much less from this paper, but it is simple, intuitive, AND achieves SotA!

Tags

Paper Aliases

No aliases