Papperoni

2018

When and Why Are Pre-Trained Word Embeddings Useful for Neural Machine Translation?

Graham Neubig

Open PDF Google Scholar

citations

Cite Score

20

AI summary

This paper analyzes the use of pre-trained word embeddings in NMT tasks. It uses TED talks transcripts to create parallel corpus between English and three pairs of languages. It finds that pre-trained embeddings can provide gains of up to 20 BLEU points in favorable settings.

Main Contributions

It examines the effectiveness of pre-trained word embeddings across various languages in NMT.
It analyzes whether pre-training is more effective for similar translation pairs.
It investigates whether alignment of word embeddings improves performance.
It studies the impact of pre-training in multilingual translation systems.
The paper identifies a sweet spot where word embeddings are most effective in low-resource scenarios.

Abstract

The performance of Neural Machine Translation (NMT) systems often suffers in low-resource scenarios where sufficiently large-scale parallel corpora cannot be obtained. Pre-trained word embeddings have proven to be invaluable for improving performance in natural language analysis tasks, which often suffer from paucity of data. However, their utility for NMT has not been extensively explored. In this work, we perform five sets of experiments that analyze when we can expect pre-trained word embeddings to help in NMT tasks. We show that such embeddings can be surprisingly effective in some cases – providing gains of up to 20 BLEU points in the most favorable setting.

Citation Graph

Loading graph...

References [25]

Sort:

Filter:

[1]Adam: A Method for Stochastic Optimization

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Amazing paper! Very well explained and huge impact. I am amazed that they made something so simple even when it requires a lot of background mathematical knowledge

[2]Distributed Representations of Words and Phrases and Their Compositionality

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Introduced word2vec. Game changer.

[3]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[4]BLUE: A Method for Automatic Evaluation of Machine Translation

K. Papineni, S. Roukos, T. Ward, Wei Jing Zhu - 2002

19 papers in library cite

Very cool idea. Simple yet very impactful!

[5]Understanding the Difficulty of Training Deep Feedforward Neural Networks

Yoshua Bengio - 2010

20 papers in library cite

Nice but underwhelming results (they still underperform vs. pretraining). I also didn't really like the way it's written. It's not bad, it's just a bit clunky. Worth the read though.

[6]Convolutional Neural Networks for Sentence Classification

Yoon Kim - 2014

8 papers in library cite

It's nice, goes straight to the point. I can see why it has tons of citations. However, I am not sure it was as impactful as 20k citations.

[7]Enriching Word Vectors With Subword Information

Tomas Mikolov - 2017

7 papers in library cite

It's just word2vec with ngrams instead of full words. Unsurprising, but important nonetheless

[8]Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation

Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016

15 papers in library cite

It's a very good paper but TBH doesn't bring anything new other than joining a bunch of existing stuff. I think it ended up being foundational because it's Google and several people used it as a base for future research. Good contribution then :)

[9]Word Translation Without Parallel Data

Alexis Conneau, G. Lample, Marc'aurelio Ranzato, L. Denoyer, Hervé Jégou - 2018

3 papers in library cite

I really really liked the methodology, but it is a bit hard to read given the math jargon they use. Also, the task (word translation) is very boring and I think lacks practical usages

[10]Unsupervised Neural Machine Translation

M. Artetxe, G. Labaka, E. Agirre, Kyunghyun Cho - 2017

4 papers in library cite

Very nice methodology! I am impressed that this is even possible. However, results are somewhat underwhelming...

[11]Unsupervised Pretraining for Sequence to Sequence Learning

P. Ramachandran, P. J. Liu, Quoc V. Le - 2017

9 papers in library cite

It's alright, but it's the same Seq2Seq thing with pretraining

[12]Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

M. J. Johnson, M. Schuster, Quoc V. Le, M. Krikun, Yonghui Wu, Ziru Chen, N. Thorat, F. B. Viegas, M. Wattenberg, G. S. Corrado, M. Hughes, Jeffrey Dean - 2017

7 papers in library cite

Google's NMT system V2?

[13]Advances in Pre-Training Distributed Word Representations

Tomas Mikolov, E. Grave, Piotr Bojanowski, C. Puhrsch, Armand Joulin - 2017

1 paper in library cites

Seems like a good review

[14]Dual Learning for Machine Translation

D. He, Y. Xia, T. Qin, Lisa Wang, N. Yu, T. Liu, W. Y. Ma - 2016

2 papers in library cite

I marked as something I should pay attention to

[15]Multi-Way, Multilingual Neural Machine Translation With a Shared Attention Mechanism

O. Firat, Kyunghyun Cho, Yoshua Bengio - 2016

2 papers in library cite

Bengio

[16]Offline Bilingual Word Vectors, Orthogonal Transformations and the Inverted Softmax

S. L. Smith, D. H. Turban, S. Hamblin, N. Y. Hammerla - 2017

4 papers in library cite

Inverted softmax

[17]Stronger Baselines for Trustable Results in Neural Machine Translation

M. Denkowski, Graham Neubig - 2017

2 papers in library cite

Flopped

[18]Neural Architectures for Named Entity Recognition

G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, C. Dyer - 2016

4 papers in library cite

[19]End-to-End Sequence Labeling via Bi-Directional LSTM-CNNS-CRF

X. Ma, Eduard Hovy - 2016

3 papers in library cite

[20]Semi-Supervised Learning for Neural Machine Translation

Y. Cheng, Weixin Xu, Z. He, Weiran He, H. Wu, Maosong Sun, Yibo Liu - 2016

2 papers in library cite

[21]A Bag of Useful Tricks for Practical Neural Machine Translation: Embedding Layer Initialization and Large Batch Size

M. Neishi, J. Sakuma, S. Tohda, S. Ishiwatari, N. Yoshinaga, M. Toyoda - 2017

1 paper in library cites

[22]Monolingual Embeddings for Low Resourced Neural Machine Translation

M. A. D. Gangi, M. Federico - 2017

1 paper in library cites

[23]The Concise Oxford Dictionary of Linguistics

P. H. Matthews - 1997

1 paper in library cites

[24]The Slavonic Languages

G. Corbett, B. Comrie - 2003

1 paper in library cites

[25]XNMT: The Extensible Neural Machine Translation Toolkit

Graham Neubig, M. Sperber, Xinpeng Wang, M. Felix, A. Matthews, S. Padmanabhan, Y. Qi, D. S. Sachan, P. Arthur, P. Godard, J. Hewitt, R. Riad, Lisa Wang - 2018

1 paper in library cites

Cited by

1

papers in your library

Cites

17

papers in your library

Read

on October 23, 2025

It's not bad, it's just that it severely lacks deeper analysis on their results. A few of their results are contradictory, and they don't analyze why.

Tags

Paper Aliases

No aliases