2016

Neural Machine Translation of Rare Words with Subword Units

R. Sennrich, B. Haddow, Alexandra Birch

citations

Cite Score

84

AI summary

This paper introduces a method for open-vocabulary neural machine translation by encoding rare words as sequences of subword units using byte pair encoding, achieving improvements of up to 1.3 BLEU on the WMT 15 translation tasks English-German and English-Russian.

Main Contributions

  • The paper introduces a method for open-vocabulary neural machine translation by encoding rare words as sequences of subword units.
  • The paper adapts byte pair encoding (BPE), a compression algorithm, to the task of word segmentation, allowing for the representation of an open vocabulary through a fixed-size vocabulary of variable-length character sequences.
  • The paper shows that open-vocabulary neural machine translation is possible by encoding (rare) words via subword units.
  • The paper finds that the proposed architecture is simpler and more effective than using large vocabularies and back-off dictionaries.
  • The paper achieves improvements of up to 1.3 BLEU on the WMT 15 translation tasks English-German and English-Russian.

Abstract

Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character n-gram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English German and English Russian by up to 1.1 and 1.3 BLEU, respectively.

Citation Graph

Loading graph...

References [36]

Sort:
Filter:

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

T. Luong, H. Pham, Christopher D. Manning - 2015

15 papers in library cite

Matthew D. Zeiler - 2012

13 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Yoshua Bengio - 2014

12 papers in library cite

T. Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, Wojciech Zaremba - 2014

14 papers in library cite

Tomas Mikolov, Ilya Sutskever, A. Deoras, H. S. Le, S. Kombrink, Jan Cernocky - 2012

7 papers in library cite

P. Koehn, H. Hoang, Alexandra Birch, Chris Callison Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, E. Herbst - 2007

8 papers in library cite

Yoon Kim, Yacine Jernite, D. Sontag, Alexander M. Rush - 2016

7 papers in library cite

W. Ling, C. Dyer, A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo, T. Luis - 2015

5 papers in library cite

C. Dyer, V. Chahuneau, Noah A. Smith - 2013

4 papers in library cite

T. Luong, Richard Socher, Christopher D. Manning - 2013

4 papers in library cite

P. Gage - 1994

3 papers in library cite

I. Bazzi - 2002

3 papers in library cite

M. R. C. Jussa, J. A. R. Fonollosa - 2016

2 papers in library cite

J. A. Botha, Phil Blunsom - 2014

2 papers in library cite

M. Stanojevic, A. Kamran, P. Koehn, O. Bojar - 2015

2 papers in library cite

R. Chitnis, J. Denero - 2015

2 papers in library cite

R. Sennrich, B. Haddow - 2015

1 paper in library cites

D. Vilar, J. T. Peter, Hermann Ney - 2007

1 paper in library cites

J. Tiedemann - 2012

1 paper in library cites

J. Tiedemann - 2009

1 paper in library cites

M. Popovic - 2015

1 paper in library cites

P. Koehn, K. Knight - 2003

1 paper in library cites

S. Niessen, Hermann Ney - 2000

1 paper in library cites

N. Durrani, H. Sajjad, H. Hoang, P. Koehn - 2014

1 paper in library cites

Graham Neubig, T. Watanabe, S. Mori, T. Kawahara - 2012

1 paper in library cites

S. Virpioja, J. J. Vayrynen, M. Creutz, M. Sadeniemi - 2007

1 paper in library cites

B. Haddow, M. Huck, Alexandra Birch, N. Bogoychev, P. Koehn - 2015

1 paper in library cites

M. Creutz, K. Lagus - 2002

1 paper in library cites

D. Stallard, Jacob Devlin, M. Kayser, Y. K. Lee, R. Barzilay - 2012

1 paper in library cites

B. Snyder, R. Barzilay - 2008

1 paper in library cites

F. M. Liang - 1983

1 paper in library cites

Cited by

22

papers in your library

Cites

10

papers in your library

Read

on July 3, 2025

Your review

Tags

Paper Aliases

No aliases