2013

How to Construct Deep Recurrent Neural Networks

Razvan Pascanu, C. G. Gulcehre, Kyunghyun Cho, Yoshua Bengio

citations

Cite Score

45

AI summary

This paper explores how to extend RNNs to deep RNNs by analyzing three points: input-to-hidden, hidden-to-hidden transition, and hidden-to-output functions. It introduces two novel deep RNN architectures and a neural operator framework, evaluated on polyphonic music prediction and language modeling, demonstrating performance gains.

Main Contributions

  • Identified three points in RNN architecture that can be deepened: input-to-hidden, hidden-to-hidden, and hidden-to-output functions.
  • Proposed two novel deep RNN architectures orthogonal to stacking recurrent layers.
  • Introduced a novel framework based on neural operators for interpreting deep RNNs.
  • Empirically evaluated the proposed deep RNNs on polyphonic music prediction and language modeling tasks.
  • Demonstrated that proposed deep RNNs benefit from depth and outperform conventional shallow RNNs.

Abstract

In this paper, we explore different ways to extend a recurrent neural network (RNN) to a deep RNN. We start by arguing that the concept of depth in an RNN is not as clear as it is in feedforward neural networks. By carefully analyzing and understanding the architecture of an RNN, however, we find three points of an RNN which may be made deeper; (1) input-to-hidden function, (2) hidden-to-hidden transition and (3) hidden-to-output function. Based on this observation, we propose two novel architectures of a deep RNN which are orthogonal to an earlier attempt of stacking multiple recurrent layers to build a deep RNN (Schmidhuber, 1992; El Hihi and Bengio, 1996). We provide an alternative interpretation of these deep RNNs using a novel framework based on neural operators. The proposed deep RNNs are empirically evaluated on the tasks of polyphonic music prediction and language modeling. The experimental result supports our claim that the proposed deep RNNs benefit from the depth and outperform the conventional, shallow RNNs.

Citation Graph

Loading graph...

References [46]

Sort:
Filter:

Tomas Mikolov, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

26 papers in library cite

Tomas Mikolov, Ilya Sutskever, K. Chen, G. S. Corrado, Jeffrey Dean - 2013

32 papers in library cite

D. E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams - 1986

34 papers in library cite

Kur Hornik, Maxwell Stinchcombe, Halbert White - 1989

3 papers in library cite

Geoffrey Hinton, Ruslan Salakhutdinov - 2006

37 papers in library cite

Geoffrey Hinton - 2012

21 papers in library cite

Yoshua Bengio, Patrice Simard, Paolo Frasconi - 1994

31 papers in library cite

Xavier Glorot, Antoine Bordes, Yoshua Bengio - 2011

17 papers in library cite

Yoshua Bengio - 2009

25 papers in library cite

Geoffrey Hinton - 2013

13 papers in library cite

Geoffrey E. Hinton, N. Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan R. Salakhutdinov - 2012

25 papers in library cite

M. P. Marcus, B. Santorini, Mary Ann Marcinkiewicz - 1993

22 papers in library cite

Razvan Pascanu, Tomas Mikolov, Yoshua Bengio - 2013

21 papers in library cite

Tomas Mikolov, M. Karafiat, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2010

36 papers in library cite

Ilya Sutskever, James Martens, G. Dahl, Geoffrey Hinton - 2013

13 papers in library cite

Alex Graves - 2013

27 papers in library cite

Yoshua Bengio - 2013

17 papers in library cite

Xavier Glorot, Antoine Bordes, Yoshua Bengio - 2011

3 papers in library cite

Ilya Sutskever, James Martens, Geoffrey E. Hinton - 2011

13 papers in library cite

Tomas Mikolov, S. Kombrink, Lukas Burget, Jan Cernocky, Sanjeev Khudanpur - 2011

16 papers in library cite

F. Bastien, P. Lamblin, Razvan Pascanu, James Bergstra, I. Goodfellow, A. Bergeron, A. Bouchard, N. Nicolas, Yoshua Bengio - 2012

13 papers in library cite

Tapani Raiko, Harri Valpola, Yann Lecun - 2012

7 papers in library cite

James Martens - 2010

12 papers in library cite

James Martens, Ilya Sutskever - 2011

13 papers in library cite

I. Goodfellow, Quoc Le, A. Saxe, A. Ng - 2009

7 papers in library cite

Tomas Mikolov, Ilya Sutskever, A. Deoras, H. S. Le, S. Kombrink, Jan Cernocky - 2012

7 papers in library cite

James Bergstra, O. Breuleux, F. Bastien, P. Lamblin, Razvan Pascanu, G. Desjardins, J. Turian, D. W. Farley, Yoshua Bengio - 2010

22 papers in library cite

Tomas Mikolov - 2012

17 papers in library cite

Jürgen Schmidhuber - 1992

8 papers in library cite

Alex Graves - 2011

8 papers in library cite

S. Elhihi, Yoshua Bengio - 1996

6 papers in library cite

Alex Graves, M. Liwicki, Santiago Fernandez, R. Bertolami, H. Bunke, Jürgen Schmidhuber - 2009

5 papers in library cite

Hugo Larochelle, I. Murray - 2011

5 papers in library cite

Yoshua Bengio, G. Mesnil, Yann Dauphin, S. Rifai - 2013

3 papers in library cite

J. Bayer - 2013

3 papers in library cite

Razvan Pascanu, Yoshua Bengio - 2013

3 papers in library cite

C. G. Gulcehre, Kyunghyun Cho, Razvan Pascanu, Yoshua Bengio - 2013

2 papers in library cite

P. Pinheiro, Ronan Collobert - 2014

2 papers in library cite

O. Delalleau, Yoshua Bengio - 2011

2 papers in library cite

M. Hermans, B. Schrauwen - 2013

2 papers in library cite

Jixuan Chen, L. Deng - 2013

1 paper in library cites

Harri Valpola, J. Karhunen - 2002

1 paper in library cites

Herbert Jaeger - 2007

1 paper in library cites

J. Ko, F. Dieter - 2009

1 paper in library cites

Razvan Pascanu, G. Montufar, Yoshua Bengio - 2013

1 paper in library cites

Cited by

7

papers in your library

Cites

28

papers in your library

Read

on June 7, 2025

Your review

Tags

Paper Aliases

No aliases