Papperoni

2017

Unsupervised Neural Machine Translation

M. Artetxe, G. Labaka, E. Agirre, Kyunghyun Cho

Open PDF Google Scholar

citations

Cite Score

37

AI summary

The paper introduces a novel method to train NMT systems in a completely unsupervised manner. It uses an attentional encoder-decoder model trained on monolingual corpora with denoising and back-translation. The model achieves 15.56 BLEU (Fr-En) and 10.21 (De-En) on WMT 2014.

Main Contributions

Introduces a novel method to train NMT systems in a completely unsupervised manner, relying solely on monolingual corpora.
Employs a modified attentional encoder-decoder model with a shared encoder and fixed cross-lingual embeddings.
Uses a combination of denoising and on-the-fly backtranslation for training.
Achieves 15.56 BLEU points in WMT 2014 French → English translation and 10.21 BLEU points in German → English translation using only monolingual data.
Demonstrates that the model can benefit from small parallel corpora, achieving 21.81 and 15.24 BLEU points with 100,000 parallel sentences.

Abstract

In spite of the recent success of neural machine translation (NMT) in standard benchmarks, the lack of large parallel corpora poses a major practical problem for many language pairs. There have been several proposals to alleviate this issue with, for instance, triangulation and semi-supervised learning techniques, but they still require a strong cross-lingual signal. In this work, we completely remove the need of parallel data and propose a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora. Our model builds upon the recent work on unsupervised embedding mappings, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingual corpora alone using a combination of denoising and backtranslation. Despite the simplicity of the approach, our system obtains 15.56 and 10.21 BLEU points in WMT 2014 French → English and German → English translation. The model can also profit from small parallel corpora, and attains 21.81 and 15.24 points when combined with 100,000 parallel sentences, respectively. Our implementation is released as an open source project.

Citation Graph

Loading graph...

References [36]

Sort:

Filter:

[1]Adam: A Method for Stochastic Optimization

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Amazing paper! Very well explained and huge impact. I am amazed that they made something so simple even when it requires a lot of background mathematical knowledge

[2]Distributed Representations of Words and Phrases and Their Compositionality

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

Introduced word2vec. Game changer.

[3]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[4]Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Introduces RNN encoder-decoder. I love it :)

[5]Sequence to Sequence Learning With Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Good paper, but I think it only got famous because they set a new good baseline for NNs in MT. Their main contribution was reversing the source sentence TBH.

[6]Effective Approaches to Attention-Based Neural Machine Translation

T. Luong, H. Pham, Christopher D. Manning - 2015

15 papers in library cite

Good paper, but very derivative. Attention methods start getting very complicated... I understand why Transformers took over TBH

[7]Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network With a Local Denoising Criterion

P. H. Vincent, Larochelle, Isabelle Lajoie, Yoshua Bengio, Pierre Antoine Manzagol - 2010

6 papers in library cite

This is basically a summary of everything that happened from 2006-2010, and also points some interesting things about DBNs! Very well explained as well.

[8]Neural Machine Translation of Rare Words with Subword Units

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Very good! Simple, explains quite a lot and good results. Forms the basis for a lot of stuff now!

[9]Google's Neural Machine Translation System: Bridging the Gap Between Human and Machine Translation

Yonghui Wu, M. Schuster, Ziru Chen, Quoc V. Le, M. Norouzi, W. Macherey, M. Krikun, Yue Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. J. Johnson, Xiaodong Liu, Lukasz Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, Wenyi Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, Oriol Vinyals, G. S. Corrado, M. Hughes, Jeffrey Dean - 2016

15 papers in library cite

It's a very good paper but TBH doesn't bring anything new other than joining a bunch of existing stuff. I think it ended up being foundational because it's Google and several people used it as a base for future research. Good contribution then :)

[10]Exploiting Similarities Among Languages for Machine Translation

Tomas Mikolov, Quoc V. Le, Ilya Sutskever - 2013

6 papers in library cite

It's just word2vec + a projection, but results are very nice and it's surprisingly simple!

[11]Semi-Supervised Sequence Learning

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Very good paper that was probably the first to introduce pre-training in NLP!

[12]Learning Distributed Representations of Sentences From Unlabelled Data

F. Hill, Kyunghyun Cho, Anna Korhonen - 2016

12 papers in library cite

I liked it overall (almost gave it 4 starts), but I am a bit tired of embeddings. It's better than the other ones though. FastSent is nice.

[13]Unsupervised Pretraining for Sequence to Sequence Learning

P. Ramachandran, P. J. Liu, Quoc V. Le - 2017

9 papers in library cite

It's alright, but it's the same Seq2Seq thing with pretraining

[14]Improving Neural Machine Translation Models With Monolingual Data

R. Sennrich, B. Haddow, Alexandra Birch - 2016

4 papers in library cite

The most cited for multilingual/monolingual NMT (out of other references)

[15]Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation

M. J. Johnson, M. Schuster, Quoc V. Le, M. Krikun, Yonghui Wu, Ziru Chen, N. Thorat, F. B. Viegas, M. Wattenberg, G. S. Corrado, M. Hughes, Jeffrey Dean - 2017

7 papers in library cite

Google's NMT system V2?

[16]Six Challenges for Neural Machine Translation

P. Koehn, R. Knowles - 2017

1 paper in library cites

I like these review papers!

[17]Dual Learning for Machine Translation

D. He, Y. Xia, T. Qin, Lisa Wang, N. Yu, T. Liu, W. Y. Ma - 2016

2 papers in library cite

I marked as something I should pay attention to

[18]Multi-Way, Multilingual Neural Machine Translation With a Shared Attention Mechanism

O. Firat, Kyunghyun Cho, Yoshua Bengio - 2016

2 papers in library cite

Bengio

[19]Offline Bilingual Word Vectors, Orthogonal Transformations and the Inverted Softmax

S. L. Smith, D. H. Turban, S. Hamblin, N. Y. Hammerla - 2017

4 papers in library cite

Inverted softmax

[20]Learning Bilingual Word Embeddings With (Almost) No Bilingual Data

M. Artetxe, G. Labaka, E. Agirre - 2017

2 papers in library cite

Bilingual word embeddings

[21]Learning Principled Bilingual Mappings of Word Embeddings While Preserving Monolingual Invariance

M. Artetxe, G. Labaka, E. Agirre - 2016

2 papers in library cite

Bilingual word embeddings

[22]Adversarial Training for Unsupervised Bilingual lexicon Induction

Mingchuan Zhang, Yibo Liu, H. Luan, Maosong Sun - 2017

2 papers in library cite

Flopped a bit but adversarial leraning

[23]Hubness and Pollution: Delving Into Cross-Space Mapping for Zero-Shot Learning

A. Lazaridou, G. Dinu, M. Baroni - 2015

2 papers in library cite

The first to notice the "hubness" in multi dimensional spaces

[24]A Teacher-Student Framework for Zero-Resource Neural Machine Translation

Yanru Chen, Yibo Liu, Y. Cheng, V. O. K. Li - 2017

1 paper in library cites

I am curious about the teacher-student framework here

[25]Bilbowa: Fast Bilingual Distributed Representations Without Word Alignments

S. Gouws, Yoshua Bengio, G. Corrado - 2015

2 papers in library cite

[26]Contrastive estimation: Training Log-Linear Models on Unlabeled Data

Noah A. Smith, J. Eisner - 2005

2 papers in library cite

[27]Deciphering Foreign Language

S. Ravi, K. Knight - 2011

2 papers in library cite

[28]Unifying Bayesian Inference and Vector Space Models for Improved Decipherment

Q. Dou, Ashish Vaswani, K. Knight, C. Dyer - 2015

2 papers in library cite

[29]Zero-Resource Translation With Multi-Lingual Neural Machine Translation

O. Firat, B. Sankaran, Y. A. Onaizan, F. T. Y. Vural, Kyunghyun Cho - 2016

2 papers in library cite

[30]Bilingual Word Representations With Monolingual Quality in Mind

T. Luong, H. Pham, Christopher D. Manning - 2015

1 paper in library cites

[31]Copied Monolingual Data Improves Low-Resource Neural Machine Translation

A. Currey, A. V. M. Barone, K. Heafield - 2017

1 paper in library cites

[32]Dependency-Based Decipherment for Resource-Limited Machine Translation

Q. Dou, K. Knight - 2013

1 paper in library cites

[33]Fully Character-Level Neural Machine Translation Without Explicit Segmentation

Jaehoon Lee, Kyunghyun Cho, T. Hofmann - 2017

1 paper in library cites

[34]Large Scale Decipherment for Out-of-Domain Machine Translation

Q. Dou, K. Knight - 2012

1 paper in library cites

[35]Toward Multilingual Neural Machine Translation With Universal Encoder and Decoder

T. L. Ha, J. Niehues, A. Waibel - 2016

1 paper in library cites

[36]Towards Cross-Lingual Distributed Representations Without Parallel Text Trained With Adversarial Autoencoders

A. V. M. Barone - 2016

1 paper in library cites

Cited by

4

papers in your library

Cites

24

papers in your library

Read

on November 2, 2025

Very nice methodology! I am impressed that this is even possible. However, results are somewhat underwhelming...

Tags

Paper Aliases

No aliases