2018
Cite Score
23
AI summary
This paper introduces a character-level language model using a deep (64-layer) transformer with self-attention. It achieves state-of-the-art results on text8 and enwik8 datasets by incorporating auxiliary losses at intermediate layers and positions, demonstrating the effectiveness of deep transformers for character-level language modeling.
Main Contributions
Abstract
LSTMs and other RNN variants have shown strong performance on character-level language modeling. These models are typically trained using truncated backpropagation through time, and it is common to assume that their success stems from their ability to remember long-term contexts. In this paper, we show that a deep (64-layer) transformer model (Vaswani et al. 2017) with fixed context outperforms RNN variants by a large margin, achieving state of the art on two popular benchmarks: 1.13 bits per character on text8 and 1.06 on enwik8. To get good results at this depth, we show that it is important to add auxiliary losses, both at intermediate network layers and intermediate sequence positions.
Citation Graph
References [46]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017
47 papers in library cite
Sepp Hochreiter, Jürgen Schmidhuber - 1997
94 papers in library cite
S. Ioffe, Christian Szegedy - 2015
18 papers in library cite
Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014
38 papers in library cite
Jimmy Lei Ba, R. Kiros, Geoffrey E. Hinton - 2016
14 papers in library cite
Yoshua Bengio, R. Ducharme, Pascal Vincent - 2001
62 papers in library cite
Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010
36 papers in library cite
P. Werbos - 1990
9 papers in library cite
Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals - 2014
22 papers in library cite
Noam Shazeer, Azalia Mirhoseini, K. Maziarz, A. Davis, Quoc Le, Geoffrey Hinton, Jeffrey Dean - 2017
9 papers in library cite
S. Sukhbaatar, A. Szlam, Jason Weston, Rob Fergus - 2015
18 papers in library cite
M. Sundermeyer, R. Schluter, Hermann Ney - 2010
7 papers in library cite
Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber - 2001
16 papers in library cite
Jason Weston, S. Chopra, Antoine Bordes - 2015
18 papers in library cite
Yarin Gal - 2015
9 papers in library cite
Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011
16 papers in library cite
R. Jozefowicz, Oriol Vinyals, M. Schuster, Noam Shazeer, Yonghui Wu - 2016
20 papers in library cite
C. Chelba, Tomas Mikolov, M. Schuster, Q. Ge, T. Brants, P. Koehn, Tony Robinson - 2013
13 papers in library cite
Alec Radford, R. Jozefowicz, Ilya Sutskever - 2017
8 papers in library cite
U. Khandelwal, He He, P. Qi, Dan Jurafsky - 2018
2 papers in library cite
E. Grave, Armand Joulin, Nicolas Usunier - 2016
7 papers in library cite
Tomas Mikolov, Ilya Sutskever, A. Deoras, H. S. Le, S. Kombrink, Jan Cernocky - 2012
7 papers in library cite
X. Zhang, J. Zhao, Yann Lecun - 2015
7 papers in library cite
Yann N. Dauphin, A. Fan, Michael Auli, D. Grangier - 2016
8 papers in library cite
S. Merity, Nitish Shirish Keskar, Richard Socher - 2017
6 papers in library cite
J. Chung, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio - 2015
3 papers in library cite
N. Kalchbrenner, L. Espeholt, K. Simonyan, A. V. D. Oord, Alex Graves, Koray Kavukcuoglu - 2016
5 papers in library cite
J. G. Zilly, R. K. Srivastava, J. Koutnik, Jürgen Schmidhuber - 2016
6 papers in library cite
T. Cooijmans, Nicolas Ballas, C. Laurent, Aaron Courville - 2016
3 papers in library cite
David Krueger, T. Maharaj, J. Kramar, M. Pezeshki, Nicolas Ballas, N. R. Ke, A. G. A. P. Goyal, Yoshua Bengio, Hugo Larochelle, Aaron Courville - 2016
3 papers in library cite
B. Krause, E. Kahembwe, I. Murray, S. Renals - 2017
3 papers in library cite
B. Krause, L. Lu, I. Murray, S. Renals - 2016
3 papers in library cite
A. Mujika, F. Meier, A. Steger - 2017
2 papers in library cite
J. Chung, S. Ahn, Yoshua Bengio - 2016
2 papers in library cite
Shanda Li, Wentao Li, C. Cook, C. Zhu, Y. Gao - 2018
2 papers in library cite
S. Zhang, Yonghui Wu, T. Che, Zongyu Lin, R. Memisevic, Ruslan R. Salakhutdinov, Yoshua Bengio - 2016
1 paper in library cites
T. Kenter, Llion Jones, D. Hewlett - 2018
1 paper in library cites
M. Daniluk, Tim Rocktaschel, J. Welbl, Sebastian Riedel - 2017
1 paper in library cites
T. Salimans, Haowei Zhang, Alec Radford, D. N. Metaxas - 2018
1 paper in library cites
M. Mahoney - 2009
1 paper in library cites
N. R. Ke, A. G. A. P. Goyal, O. Bilaniuk, J. Binas, L. Charlin, C. Pal, Yoshua Bengio - 2017
1 paper in library cites
K. M. Rocki - 2016
1 paper in library cites
C. Tallec, Y. Ollivier - 2017
1 paper in library cites
M. Arjovsky, A. Shah, Yoshua Bengio - 2015
1 paper in library cites
Alexis Conneau, Holger Schwenk, L. Barrault, Yann Lecun - 2016
1 paper in library cites
Cited by
6
papers in your library
Cites
30
papers in your library
Read
on November 16, 2025
Your review
Tags
Paper Aliases
No aliases