Cite Score
14
AI summary
This paper introduces Byte-to-Span (BTS), an LSTM-based model that reads text as bytes and outputs span annotations, and is able to analyze text in many languages with a single model. The multilingual models are compact, but produce results similar to or better than the state-of-the-art in Part-of-Speech tagging and Named Entity Recognition.
Main Contributions
Abstract
We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in our vocabulary. Because we operate directly on unicode bytes rather than language-specific words or characters, we can analyze text in many languages with a single model. Due to the small vocabulary size, these multilingual models are very compact, but produce results similar to or better than the state-of-the-art in Part-of-Speech tagging and Named Entity Recognition that use only the provided training datasets (no external data sources). Our models are learning "from scratch" in that they do not rely on any elements of the standard pipeline in Natural Language Processing (including tokenization), and thus can run in standalone fashion on raw text.
Citation Graph
References [33]
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014
59 papers in library cite
Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014
58 papers in library cite
Geoffrey E. Hinton, N. Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2012
25 papers in library cite
Ronan Collobert, Jason Weston, Leon Bottou, M. Karlen, Koray Kavukcuoglu, P. P. Kuksa - 2011
23 papers in library cite
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014
22 papers in library cite
Rie Kubota Ando, Tong Zhang - 2005
10 papers in library cite
A. M. Dai, Quoc V. Le - 2015
27 papers in library cite
Pascal Vincent - 2009
5 papers in library cite
X. Zhang, J. Zhao, Yann Lecun - 2015
7 papers in library cite
Alex Graves, Navdeep Jaitly - 2014
2 papers in library cite
D. Bahdanau, J. Chorowski, D. Serdyuk, P. Brakel, Yoshua Bengio - 2015
2 papers in library cite
W. Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals - 2015
4 papers in library cite
Yoon Kim, Yacine Jernite, D. Sontag, Alexander M. Rush - 2016
7 papers in library cite
J. Lafferty, Andrew Mccallum, F. C. Pereira - 2001
6 papers in library cite
W. Ling, C. Dyer, A. W. Black, I. Trancoso, R. Fermandez, S. Amir, L. Marujo, T. Luis - 2015
5 papers in library cite
M. Ballesteros, C. Dyer, Noah A. Smith - 2015
3 papers in library cite
R. Florian, A. Ittycheriah, H. Jing, Tong Zhang - 2003
3 papers in library cite
Dan Klein, J. Smarr, H. Nguyen, Christopher D. Manning - 2003
2 papers in library cite
R. Chitnis, J. Denero - 2015
2 papers in library cite
Slav Petrov, Dipanjan Das, R. Mcdonald - 2011
1 paper in library cites
Zhongqiang Huang, Weixin Xu, K. Yu - 2015
1 paper in library cites
C. D. Santos, V. Guimaraes, R. Niteroi, R. D. Janeiro - 2015
1 paper in library cites
K. W. Church - 1993
1 paper in library cites
T. Nakagawa - 2004
1 paper in library cites
G. Frantzeskou, E. Stamatatos, S. Gritzalis, S. Katsikas - 2006
1 paper in library cites
F. Eyben, M. Wollmer, B. Schuller, Alex Graves - 2009
1 paper in library cites
F. Peng, Dale Schuurmans, Shijie Wang, V. Keselj - 2003
1 paper in library cites
J. Nothman, N. Ringland, W. Radford, T. Murphy, J. R. Curran - 2013
1 paper in library cites
A. Passos, V. Kumar, Andrew Mccallum - 2014
1 paper in library cites
L. Duong, T. Cohn, S. Bird, P. Cook - 2015
1 paper in library cites
X. Carreras, L. Marques, L. Padro - 2002
1 paper in library cites
Christopher D. Manning - 2011
1 paper in library cites
Cited by
2
papers in your library
Cites
13
papers in your library
Read
on October 28, 2025
Your review
Tags
Paper Aliases
No aliases