2017

One Model to Learn Them All

Lukasz Kaiser, Aidan N. Gomez, Noam Shazeer, Ashish Vaswani, Niki Parmar, Llion Jones, Jakob Uszkoreit

citations

Cite Score

20

AI summary

This paper introduces a MultiModel architecture, a single deep learning model that can simultaneously learn multiple tasks from various domains by incorporating convolutional layers, attention mechanisms, and sparsely-gated layers. It is trained concurrently on ImageNet, multiple translation tasks, COCO, a speech recognition corpus, and an English parsing task.

Main Contributions

  • Introduces a MultiModel architecture, a single deep-learning model that can simultaneously learn multiple tasks from various domains.
  • The architecture incorporates building blocks from multiple domains, including convolutional layers, an attention mechanism, and sparsely-gated layers.
  • Shows that adding computational blocks never hurts performance, even on tasks they were not designed for.
  • Tasks with less data benefit largely from joint training with other tasks.
  • Performance on large tasks degrades only slightly if at all.

Abstract

Deep learning yields great results across many fields, from speech recognition, image classification, to translation. But for each problem, getting a deep model to work well involves research into the architecture and a long period of tuning. We present a single model that yields good results on a number of problems spanning multiple domains. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task. Our model architecture incorporates building blocks from multiple domains. It contains convolutional layers, an attention mechanism, and sparsely-gated layers. Each of these computational blocks is crucial for a subset of the tasks we train on. Interestingly, even if a block is not crucial for a task, we observe that adding it never hurts performance and in most cases improves it on all tasks. We also show that tasks with less data benefit largely from joint training with other tasks, while performance on large tasks degrades only slightly if at all.

Citation Graph

Loading graph...

References [31]

Sort:
Filter:

D. P. Kingma, Jimmy Lei Ba - 2014

49 papers in library cite

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton - 2012

71 papers in library cite

Sepp Hochreiter, Jürgen Schmidhuber - 1997

94 papers in library cite

T. Y. Lin, M. Maire, S. Belongie, James Hays, Pietro Perona, D. Ramanan, Piotr Dollar, C. L. Zitnick - 2014

14 papers in library cite

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Kyunghyun Cho, B. V. Merrienboer, C. G. Gulcehre, D. Bahdanau, F. Bougares, Holger Schwenk, Yoshua Bengio - 2014

38 papers in library cite

Ilya Sutskever, Oriol Vinyals, Quoc V. Le - 2014

58 papers in library cite

Jimmy Lei Ba, R. Kiros, Geoffrey E. Hinton - 2016

14 papers in library cite

R. Sennrich, B. Haddow, Alexandra Birch - 2016

22 papers in library cite

Ronan Collobert, Jason Weston - 2008

32 papers in library cite

G. Dahl, D. Yu, L. Deng, Alex Acero - 2012

19 papers in library cite

Noam Shazeer, Azalia Mirhoseini, K. Maziarz, A. Davis, Quoc Le, Geoffrey Hinton, Jeffrey Dean - 2017

9 papers in library cite

N. Kalchbrenner, Phil Blunsom - 2013

27 papers in library cite

Lukasz Kaiser, Samy Bengio - 2016

2 papers in library cite

O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Zhongqiang Huang, A. Karpathy, A. Khosla, M. Bernstein - 2014

18 papers in library cite

Francois Chollet - 2016

2 papers in library cite

Christian Szegedy, S. Ioffe, Vincent Vanhoucke, A. A. Alemi - 2017

3 papers in library cite

M. J. Johnson, M. Schuster, Quoc V. Le, M. Krikun, Yonghui Wu, Ziru Chen, N. Thorat, F. B. Viegas, M. Wattenberg, G. S. Corrado, M. Hughes, Jeffrey Dean - 2017

7 papers in library cite

N. Kalchbrenner, L. Espeholt, K. Simonyan, A. V. D. Oord, Alex Graves, Koray Kavukcuoglu - 2016

5 papers in library cite

Francois Chollet, Lukasz Kaiser, Aidan N. Gomez - 2017

1 paper in library cites

Fanqing Meng, Z. L. Lu, Mingliang Wang, H. Li, W. Jiang, Qian Liu - 2015

3 papers in library cite

J. Ngiam, A. Khosla, M. Kim, J. Nam, Honglak Lee, A. Ng - 2011

2 papers in library cite

A. V. D. Oord, S. Dieleman, H. Zen, K. Simonyan, Oriol Vinyals, Alex Graves, N. Kalchbrenner, A. Senior, Koray Kavukcuoglu - 2016

2 papers in library cite

Missing author list

1994

1 paper in library cites

B. R. Paredes, A. Argyriou, N. Berthouze, M. Pontil - 2012

1 paper in library cites

C. C. Loy, X. Tang, Zhengyou Zhang, P. Luo - 2014

1 paper in library cites

F. Yu, V. Koltun - 2015

1 paper in library cites

M. L. Seltzer, J. Droppo - 2013

1 paper in library cites

L. Sifre, S. Mallat - 2013

1 paper in library cites

M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz, A. Taylor - 1999

1 paper in library cites

Cited by

2

papers in your library

Cites

21

papers in your library

Read

on November 5, 2025

Your review

Tags

Paper Aliases

No aliases