Deep Learning Scaling is Predictable, Empirically

Hestness, Joel; Narang, Sharan; Ardalani, Newsha; Diamos, Gregory; Jun, Heewoo; Kianinejad, Hassan; Patwary, Md. Mostofa Ali; Yang, Yang; Zhou, Yanqi

doi:10.48550/arXiv.1712.00409

Computer Science > Machine Learning

(cs)

[Submitted on 1 Dec 2017]

Title:Deep Learning Scaling is Predictable, Empirically

Authors:Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou

Download PDF

Abstract: Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art.
This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in power-law exponents---the "steepness" of the learning curve---yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the power-law exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.

Comments:	19 pages, 11 figures
Subjects:	Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1712.00409 [cs.LG]
	(or arXiv:1712.00409v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.1712.00409

Submission history

From: Joel Hestness [view email]
[v1] Fri, 1 Dec 2017 17:13:14 UTC (227 KB)

Full-text links:

Download:

(license)

Current browse context:

cs.LG

< prev | next >

new | recent | 1712

Change to browse by:

cs
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Joel Hestness
Sharan Narang
Newsha Ardalani
Gregory F. Diamos
Heewoo Jun

…

export bibtex citation

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)

Litmaps (What is Litmaps?)

scite Smart Citations (What are Smart Citations?)

Code & Data

Demos