Skip to main content
Cornell University
We gratefully acknowledge support from
the Simons Foundation and member institutions.
arxiv logo > stat > arXiv:1805.08522

Help | Advanced Search

Statistics > Machine Learning

(stat)
[Submitted on 22 May 2018 (v1), last revised 21 Apr 2019 (this version, v5)]

Title:Deep learning generalizes because the parameter-function map is biased towards simple functions

Authors:Guillermo Valle-Pérez, Chico Q. Camargo, Ard A. Louis
Download PDF
Abstract: Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very general probability-complexity bound recently derived from algorithmic information theory (AIT), we argue that the parameter-function map of many DNNs should be exponentially biased towards simple functions. We then provide clear evidence for this strong simplicity bias in a model DNN for Boolean functions, as well as in much larger fully connected and convolutional networks applied to CIFAR10 and MNIST. As the target functions in many real problems are expected to be highly structured, this intrinsic simplicity bias helps explain why deep networks generalize well on real world problems. This picture also facilitates a novel PAC-Bayes approach where the prior is taken over the DNN input-output function space, rather than the more conventional prior over parameter space. If we assume that the training algorithm samples parameters close to uniformly within the zero-error region then the PAC-Bayes theorem can be used to guarantee good expected generalization for target functions producing high-likelihood training sets. By exploiting recently discovered connections between DNNs and Gaussian processes to estimate the marginal likelihood, we produce relatively tight generalization PAC-Bayes error bounds which correlate well with the true error on realistic datasets such as MNIST and CIFAR10 and for architectures including convolutional and fully connected networks.
Comments: Published as a conference paper at ICLR 2019
Subjects: Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
Cite as: arXiv:1805.08522 [stat.ML]
  (or arXiv:1805.08522v5 [stat.ML] for this version)
  https://doi.org/10.48550/arXiv.1805.08522
arXiv-issued DOI via DataCite

Submission history

From: Guillermo Valle-Pérez [view email]
[v1] Tue, 22 May 2018 11:51:36 UTC (3,068 KB)
[v2] Wed, 23 May 2018 10:55:36 UTC (3,067 KB)
[v3] Fri, 28 Sep 2018 18:22:18 UTC (3,285 KB)
[v4] Wed, 27 Feb 2019 23:40:35 UTC (7,761 KB)
[v5] Sun, 21 Apr 2019 10:16:54 UTC (7,676 KB)
Full-text links:

Download:

  • PDF
  • Other formats
Current browse context:
stat.ML
< prev   |   next >
new | recent | 1805
Change to browse by:
cs
cs.AI
cs.LG
cs.NE
stat

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar

1 blog link

(what is this?)
a export bibtex citation Loading...

Bookmark

BibSonomy logo Mendeley logo Reddit logo ScienceWISE logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack