Skip to main content
Cornell University
We gratefully acknowledge support from
the Simons Foundation and member institutions.
arxiv logo > cs > arXiv:1712.00409

Help | Advanced Search

Computer Science > Machine Learning

(cs)
[Submitted on 1 Dec 2017]

Title:Deep Learning Scaling is Predictable, Empirically

Authors:Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md. Mostofa Ali Patwary, Yang Yang, Yanqi Zhou
Download PDF
Abstract: Deep learning (DL) creates impactful advances following a virtuous recipe: model architecture search, creating large training data sets, and scaling computation. It is widely believed that growing training sets and models should improve accuracy and result in better products. As DL application domains grow, we would like a deeper understanding of the relationships between training set size, computational scale, and model accuracy improvements to advance the state-of-the-art.
This paper presents a large scale empirical characterization of generalization error and model size growth as training sets grow. We introduce a methodology for this measurement and test four machine learning domains: machine translation, language modeling, image processing, and speech recognition. Our empirical results show power-law generalization error scaling across a breadth of factors, resulting in power-law exponents---the "steepness" of the learning curve---yet to be explained by theoretical work. Further, model improvements only shift the error but do not appear to affect the power-law exponent. We also show that model size scales sublinearly with data size. These scaling relationships have significant implications on deep learning research, practice, and systems. They can assist model debugging, setting accuracy targets, and decisions about data set growth. They can also guide computing system design and underscore the importance of continued computational scaling.
Comments: 19 pages, 11 figures
Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as: arXiv:1712.00409 [cs.LG]
  (or arXiv:1712.00409v1 [cs.LG] for this version)
  https://doi.org/10.48550/arXiv.1712.00409
arXiv-issued DOI via DataCite

Submission history

From: Joel Hestness [view email]
[v1] Fri, 1 Dec 2017 17:13:14 UTC (227 KB)
Full-text links:

Download:

  • PDF
  • Other formats
(license)
Current browse context:
cs.LG
< prev   |   next >
new | recent | 1712
Change to browse by:
cs
stat
stat.ML

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar

DBLP - CS Bibliography

listing | bibtex
Joel Hestness
Sharan Narang
Newsha Ardalani
Gregory F. Diamos
Heewoo Jun
…
a export bibtex citation Loading...

Bookmark

BibSonomy logo Mendeley logo Reddit logo ScienceWISE logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack