Papperoni

2019

Energy and Policy Considerations for Deep Learning in NLP

E. Strubell, A. Ganesh, Andrew Mccallum

Open PDF Google Scholar

citations

Cite Score

76

AI summary

This paper analyzes the financial and environmental costs of training various NLP models, finding that the cost of tuning a model for a new dataset can be extremely expensive. It recommends reporting training time and sensitivity to hyperparameters, as well as prioritizing computationally efficient hardware and algorithms.

Main Contributions

Quantifies the financial and environmental costs of training and developing various NLP models.
Analyzes the energy consumption of different NLP models and provides a comparison with familiar consumption metrics.
Proposes actionable recommendations to reduce costs and improve equity in NLP research and practice.
Highlights the need for reporting training time and sensitivity to hyperparameters for NLP models.
Emphasizes the importance of equitable access to computational resources for academic researchers.

Abstract

Recent progress in hardware and methodology for training neural networks has ushered in a new generation of large networks trained on abundant data. These models have obtained notable gains in accuracy across many NLP tasks. However, these accuracy improvements depend on the availability of exceptionally large computational resources that necessitate similarly substantial energy consumption. As a result these models are costly to train and develop, both financially, due to the cost of hardware and electricity or cloud compute time, and environmentally, due to the carbon footprint required to fuel modern tensor processing hardware. In this paper we bring this issue to the attention of NLP researchers by quantifying the approximate financial and environmental costs of training a variety of recently successful neural network models for NLP. Based on these findings, we propose actionable recommendations to reduce costs and improve equity in NLP research and practice.

Citation Graph

Loading graph...

References [19]

Sort:

Filter:

[1]Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin - 2017

47 papers in library cite

I mean... it introduced Transformers!

[2]BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, M. W. Chang, K. Lee, Kristina Toutanova - 2018

39 papers in library cite

Simply amazing. It's very impressive how they make a leap vs. existing stuff (you can see from the references, pretty much no one is doing what they are doing, other than GPT)

[3]Neural Machine Translation by Jointly Learning to Align and Translate

D. Bahdanau, Kyunghyun Cho, Yoshua Bengio - 2014

59 papers in library cite

Introduces the attention mechanism - amazing overall

[4]Language Models Are Unsupervised Multitask Learners

Alec Radford, Jeffrey Wu, Rewon Child, D. Luan, Dario Amodei, Ilya Sutskever - 2019

27 papers in library cite

Amazing! Tons of important contributions. I think they could have explained the models a bit better, and I think this is where OpenAI starts to become evil (and not open)

[5]Deep Contextualized Word Representations

M. E. Peters, M. Neumann, M. Iyyer, Matt Gardner, C. Clark, K. Lee, L. S. Zettlemoyer - 2018

27 papers in library cite

I didn't really like the approach. Seems a bit derivative TBH. BERT seems more elegant.

[6]Random Search for Hyper-Parameter Optimization

James Bergstra, Yoshua Bengio - 2012

7 papers in library cite

It seems crazy that it was only in 2012 that they found out that random search was good! Still, kudos on them for noticing, and the paper is just so easy to follow and enjoyable!

[7]Effective Approaches to Attention-Based Neural Machine Translation

T. Luong, H. Pham, Christopher D. Manning - 2015

15 papers in library cite

Good paper, but very derivative. Attention methods start getting very complicated... I understand why Transformers took over TBH

[8]Practical Bayesian Optimization of Machine Learning Algorithms

J. Snoek, Hugo Larochelle, R. P. Adams - 2012

9 papers in library cite

[9]Algorithms for Hyper-Parameter Optimization

J. S. Bergstra, R. Bardenet, Yoshua Bengio, B. Kegl - 2011

3 papers in library cite

[10]An Analysis of Deep Neural Network Models for Practical Applications

A. Canziani, A. Paszke, E. Culurciello - 2017

2 papers in library cite

[11]Deep Biaffine Attention for Neural Dependency Parsing

T. Dozat, Christopher D. Manning - 2017

2 papers in library cite

[12]Linguistically-Informed Self-Attention for Semantic Role Labeling

E. Strubell, P. Verga, D. Andor, D. Weiss, Andrew Mccallum - 2018

2 papers in library cite

[13]BERT Meets GPUs

C. Forster, T. Johnsen, S. Mandava, S. T. Sreenivas, D. Fu, J. Bernauer, A. Gray, S. Chetlur, Raul Puri - 2019

1 paper in library cites

[14]Clicking Clean: Who Is Winning the Race to Build a Green Internet?

G. Cook, Jaehoon Lee, T. Tsai, A. Kongn, J. Deans, B. Johnson, E. Jardim, B. Johnson - 2017

1 paper in library cites

[15]Emissions & Generation Resource Integrated Database (eGRID)

Epa - 2018

1 paper in library cites

[16]Evaluating the Energy Efficiency of Deep Convolutional Neural Networks on Cpus and Gpus

Dustin Li, X. Chen, M. Becchi, Z. Zong - 2016

1 paper in library cites

[17]Net Public Electricity Generation in Germany in 2018

B. Burger - 2019

1 paper in library cites

[18]The Evolved Transformer

D. R. So, C. Liang, Quoc V. Le - 2019

1 paper in library cites

[19]Uptime Institute Global Data Center Survey

R. Ascierto - 2018

1 paper in library cites

Cited by

3

papers in your library

Cites

7

papers in your library

Read

on November 23, 2025

Very nice push and maybe the first one to actually raise a flag - also, very contemporary and relevant discussion

Tags

Paper Aliases

No aliases