Papperoni

2015

Multi-Task Sequence to Sequence Learning

M. T. Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, Lukasz Kaiser

Open PDF Google Scholar

citations

Cite Score

36

AI summary

This paper introduces multi-task sequence-to-sequence learning, applying it to machine translation, constituency parsing, and image caption generation. It establishes a new state-of-the-art result in constituent parsing with 93.0 F1 and shows that syntactic parsing and image caption generation improves the translation quality between English and German.

Main Contributions

Introduces three multi-task learning (MTL) settings for sequence-to-sequence models: one-to-many, many-to-one, and many-to-many.
Demonstrates that training on parsing and image caption data improves translation quality between English and German by up to 1.5 BLEU points.
Establishes a new state-of-the-art result in constituent parsing with 93.0 F1.
Reveals properties of autoencoder and skip-thought objectives in the MTL context.
Explores how MTL can be useful for parsing, yielding an improvement of up to +8.9 F1 points over the baseline.

Abstract

Sequence to sequence learning has recently emerged as a new paradigm in supervised learning. To date, most of its applications focused on only one task and not much work explored this framework for multiple tasks. This paper examines three multi-task learning (MTL) settings for sequence to sequence models: (a) the one-to-many setting – where the encoder is shared between several tasks such as machine translation and syntactic parsing, (b) the many-to-one setting – useful when only the decoder can be shared, as in the case of translation and image caption generation, and (c) the many-to-many setting – where multiple encoders and decoders are shared, which is the case with unsupervised objectives and translation. Our results show that training on a small amount of parsing and image caption data can improve the translation quality between English and German by up to 1.5 BLEU points over strong single-task baselines on the WMT benchmarks. Furthermore, we have established a new state-of-the-art result in constituent parsing with 93.0 F1. Lastly, we reveal interesting properties of the two unsupervised learning objectives, autoencoder and skip-thought, in the MTL context: autoencoder helps less in terms of perplexities but more on BLEU scores compared to skip-thought.

Citation Graph

Loading graph...

References [32]

Sort:

Filter:

[1]Long Short-Term Memory

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

LSTMs FTW!

[2]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[3]Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Introduces RNN encoder-decoder. I love it :)

[4]BLUE: A Method for Automatic Evaluation of Machine Translation

K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002

19 papers in library cite

Very cool idea. Simple yet very impactful!

[5]Sequence to Sequence Learning With Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Good paper, but I think it only got famous because they set a new good baseline for NNs in MT. Their main contribution was reversing the source sentence TBH.

[6]Show, Attend and Tell: Neural Image Caption Generation With Visual Attention

K. Xu, Jimmy Lei Ba, R. Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhutdinov, R. Zemel, Yoshua Bengio - 2015

12 papers in library cite

It's a nice paper. I liked the soft attention way more than the hard one, and I am a bit mad that it wasn't the best lol And also it's the first paper I read about multimodality, but it seems that this was bustling at the time. Also results are kinda bad.

[7]Effective Approaches to Attention-Based Neural Machine Translation

T. Luong, H. Pham, Christopher D. Manning - 2015

15 papers in library cite

Good paper, but very derivative. Attention methods start getting very complicated... I understand why Transformers took over TBH

[8]Multitask Learning

Rich Caruana - 1997

13 papers in library cite

I expected waaaaaay more from this paper. The idea is sooooo simple and the results are underwhelming. Also, 30 pages for something that could be said in 10. The writing style is a bit boring. TBH it seems like it's just a re-writing of Caruana's PhD thesis.

[9]Building a Large Annotated Corpus of English: The Penn Treebank

M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993

22 papers in library cite

Well, not really interesting but very cool to see how the peen tree bank was made.

[10]Show and Tell: A Neural Image Caption Generator

Dumitru Erhan - 2015

11 papers in library cite

It's nice and they beat a ton of SotA. However, I read the one that uses attention first so this is a bit less surprising.

[11]DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

J. Donahue, Y. Jia, Oriol Vinyals, J. Hoffman, N. Zhang, E. Tzeng, Trevor Darrell - 2014

15 papers in library cite

Very nice paper. First I've seen (and based on the text, first ever) about feature extraction for images. It's very nice to see embeddings doing SotA

[12]Skip-Thought Vectors

R. Kiros, Yuxuan Zhu, Ruslan Salakhutdinov, Richard S. Zemel, R. Urtasun, Antonio Torralba, Sanja Fidler - 2015

23 papers in library cite

Nice to see an alternative to Word2Vec to sentences, but I don't really like the approach. Good nonetheless.

[13]Recurrent Continuous Translation Models

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Good paper, probably the first that used an encoder-decoder. But they used a conv. NN instead of a tradicional decoder, which I don't really like.

[14]A Framework for Learning Predictive Structures From Multiple Tasks and Unlabeled Data

Rie Kubota Ando, Tong Zhang - 2005

10 papers in library cite

Very nice and clever way of solving the problem of semi-supervised learning, and makes a lot of sense. I give them more credit for formalizing the concept. The methodology is a bit boring.

[15]Semi-Supervised Sequence Learning

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Very good paper that was probably the first to introduce pre-training in NLP!

[16]Exploring the Limits of Language Modeling

R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016

20 papers in library cite

It's funny because at first I did not like it, but then it clicked and I really liked it - they are trying to come around the large dictionary and the rare word problem. In the end it's SotA, but I think it's too convoluted and was replaced by Transformers.

[17]On Using Very Large Target Vocabulary for Neural Machine Translation

Yoshua Bengio - 2014

12 papers in library cite

It's nice, but it starts getting a bit into the realm of "yeah, that seems like a minor improvement". It's nice that they use the importance sampling stuff from the previous paper though - I thought it had completely vanished :)

[18]Grammar as a Foreign Language

Geoffrey Hinton - 2015

9 papers in library cite

It's a nice paper showing that attention can be used for parsing. However, parsing is boring and is very derivative. Good paper nonetheless.

[19]Addressing the Rare Word Problem in Neural Machine Translation

T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014

14 papers in library cite

The method was very poorly explained. It was also worse than a paper released sooner, and more complicated. Overall not that good.

[20]Is Learning the N-Th Thing Any Easier Than Learning the First?

Sebastian Thrun - 1996

3 papers in library cite

It's one of those early NN papers that only have toy examples and some very non-standard terminology. TBH doesn't add much.

[21]Dropout Improves Recurrent Neural Networks for Handwriting Recognition

V. Pham, T. Bluche, C. Kermorvant, J. Louradour - 2014

5 papers in library cite

Dropout for RNNs

[22]Multi-Task Learning for Multiple Language Translation

D. Dong, H. Wu, Weiran He, D. Yu, Haiming Wang - 2015

2 papers in library cite

MTL for translation - they translate for many target languages at once

[23]On Using monolingual corpora in Neural Machine Translation

C. G. Gulcehre, O. Firat, K. Xu, Kyunghyun Cho, L. Barrault, H. C. Lin, F. Bougares, Holger Schwenk, Yoshua Bengio - 2015

3 papers in library cite

Monolingual dataset for MT?

[24]Findings of the 2015 Workshop on Statistical Machine Translation

O. Bojar, R. Chatterjee, C. Federmann, B. Haddow, M. Huck, C. Hokamp, P. Koehn, V. Logacheva, C. Monz, M. Negri, M. Post, C. Scarton, L. Specia, M. Turchi - 2015

3 papers in library cite

[25]Montreal Neural Machine Translation Systems for WMT'15

S. Jean, O. Firat, Kyunghyun Cho, R. Memisevic, Yoshua Bengio - 2015

3 papers in library cite

[26]Multi-Task Feature Learning

A. Argyriou, T. Evgeniou, M. Pontil - 2006

3 papers in library cite

[27]Regularized Multi-Task Learning

T. Evgeniou, M. Pontil - 2004

3 papers in library cite

[28]Multilingual Acoustic Models Using Distributed Deep Neural Networks

Georg Heigold, Vincent Vanhoucke, A. Senior, P. Nguyen, M. A. Ranzato, M. Devin, Jeffrey Dean - 2013

2 papers in library cite

[29]Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Xiaodong Liu, Jianfeng Gao, X. He, K. Duh, Y. Y. Wang - 2015

2 papers in library cite

[30]Cross-Language Knowledge Transfer Using Multilingual Deep Neural Network With Shared Hidden Layers

J. T. Huang, Jeffrey Li, D. Yu, L. Deng, Y. Gong - 2013

1 paper in library cites

[31]Learning Task Grouping and Overlap in Multi-Task Learning

A. Kumar, H. D. Iii - 2012

1 paper in library cites

[32]Stanford Neural Machine Translation Systems for Spoken Language Domain

M. T. Luong, Christopher D. Manning - 2015

1 paper in library cites

Cited by

4

papers in your library

Cites

23

papers in your library

Read

on August 17, 2025

Very nice paper, but all of the things that I read about S2S sound very derivative at this point - nonetheless it's nice to see MTL again - seems a bit uncharted territory at this point, despite being very promising

Tags

Paper Aliases

No aliases