2016

Multilingual Language Processing From Bytes

Amarnag Subramanya

citations

Cite Score

14

AI summary

This paper introduces Byte-to-Span (BTS), an LSTM-based model that reads text as bytes and outputs span annotations, and is able to analyze text in many languages with a single model. The multilingual models are compact, but produce results similar to or better than the state-of-the-art in Part-of-Speech tagging and Named Entity Recognition.

Main Contributions

  • The paper introduces Byte-to-Span (BTS), an LSTM-based model that reads text as bytes and outputs span annotations.
  • The model can analyze text in many languages with a single model.
  • The multilingual models are very compact.
  • The model produces results similar to or better than the state-of-the-art in Part-of-Speech tagging and Named Entity Recognition.
  • The model learns language-independent representations at deeper levels.

Abstract

We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in our vocabulary. Because we operate directly on unicode bytes rather than language-specific words or characters, we can analyze text in many languages with a single model. Due to the small vocabulary size, these multilingual models are very compact, but produce results similar to or better than the state-of-the-art in Part-of-Speech tagging and Named Entity Recognition that use only the provided training datasets (no external data sources). Our models are learning "from scratch" in that they do not rely on any elements of the standard pipeline in Natural Language Processing (including tokenization), and thus can run in standalone fashion on raw text.

Citation Graph

Loading graph...

References [33]

Sort:
Filter:

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Geoffrey E. Hinton, N. Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2012

25 papers in library cite

Ronan Collobert, Jason Weston, Leon Bottou, M. Karlen, Koray Kavukcuoglu, P. P. Kuksa - 2011

23 papers in library cite

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014

22 papers in library cite

Rie Kubota Ando, Tong Zhang - 2005

10 papers in library cite

A. M. Dai, Quoc V. Le - 2015

27 papers in library cite

Pascal Vincent - 2009

5 papers in library cite

X. Zhang, J. Zhao, Yann Lecun - 2015

7 papers in library cite

Alex Graves, Navdeep Jaitly - 2014

2 papers in library cite

D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, Yoshua Bengio - 2015

2 papers in library cite

W. Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals - 2015

4 papers in library cite

Yoon Kim, Yacine Jernite, D. Sontag, Alexander M. Rush - 2016

7 papers in library cite

J. Lafferty, Andrew Mccallum, F. C. Pereira - 2001

6 papers in library cite

W. Ling, C. Dyer, A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo, T. Luis - 2015

5 papers in library cite

M. Ballesteros, C. Dyer, Noah A. Smith - 2015

3 papers in library cite

R. Florian, A. Ittycheriah, H. Jing, Tong Zhang - 2003

3 papers in library cite

Dan Klein, J. Smarr, H. Nguyen, Christopher D. Manning - 2003

2 papers in library cite

R. Chitnis, J. Denero - 2015

2 papers in library cite

Slav Petrov, Dipanjan Das, R. Mcdonald - 2011

1 paper in library cites

Zhongqiang Huang, Weixin Xu, K. Yu - 2015

1 paper in library cites

C. D. Santos, V. Guimaraes, R. Niteroi, R. D. Janeiro - 2015

1 paper in library cites

K. W. Church - 1993

1 paper in library cites

T. Nakagawa - 2004

1 paper in library cites

G. Frantzeskou, E. Stamatatos, S. Gritzalis, S. Katsikas - 2006

1 paper in library cites

F. Eyben, M. Wollmer, B. Schuller, Alex Graves - 2009

1 paper in library cites

F. Peng, Dale Schuurmans, Shijie Wang, V. Keselj - 2003

1 paper in library cites

J. Nothman, N. Ringland, W. Radford, T. Murphy, J. R. Curran - 2013

1 paper in library cites

A. Passos, V. Kumar, Andrew Mccallum - 2014

1 paper in library cites

L. Duong, T. Cohn, S. Bird, P. Cook - 2015

1 paper in library cites

X. Carreras, L. Marques, L. Padro - 2002

1 paper in library cites

Christopher D. Manning - 2011

1 paper in library cites

Cited by

2

papers in your library

Cites

13

papers in your library

Read

on October 28, 2025

Your review

Tags

Paper Aliases

No aliases