Skip to main content
Cornell University
We gratefully acknowledge support from
the Simons Foundation and member institutions.
arxiv logo > cs > arXiv:2111.06377

Help | Advanced Search

Computer Science > Computer Vision and Pattern Recognition

(cs)
[Submitted on 11 Nov 2021 (v1), last revised 19 Dec 2021 (this version, v3)]

Title:Masked Autoencoders Are Scalable Vision Learners

Authors:Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick
Download PDF
Abstract: This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core designs. First, we develop an asymmetric encoder-decoder architecture, with an encoder that operates only on the visible subset of patches (without mask tokens), along with a lightweight decoder that reconstructs the original image from the latent representation and mask tokens. Second, we find that masking a high proportion of the input image, e.g., 75%, yields a nontrivial and meaningful self-supervisory task. Coupling these two designs enables us to train large models efficiently and effectively: we accelerate training (by 3x or more) and improve accuracy. Our scalable approach allows for learning high-capacity models that generalize well: e.g., a vanilla ViT-Huge model achieves the best accuracy (87.8%) among methods that use only ImageNet-1K data. Transfer performance in downstream tasks outperforms supervised pre-training and shows promising scaling behavior.
Comments: Tech report. arXiv v2: add more transfer learning results; v3: add robustness evaluation
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Cite as: arXiv:2111.06377 [cs.CV]
  (or arXiv:2111.06377v3 [cs.CV] for this version)
  https://doi.org/10.48550/arXiv.2111.06377
arXiv-issued DOI via DataCite

Submission history

From: Kaiming He [view email]
[v1] Thu, 11 Nov 2021 18:46:40 UTC (6,839 KB)
[v2] Thu, 2 Dec 2021 18:30:33 UTC (6,840 KB)
[v3] Sun, 19 Dec 2021 19:23:25 UTC (6,841 KB)
Full-text links:

Download:

  • PDF
  • Other formats
Current browse context:
cs.CV
< prev   |   next >
new | recent | 2111
Change to browse by:
cs

References & Citations

  • NASA ADS
  • Google Scholar
  • Semantic Scholar

1 blog link

(what is this?)

DBLP - CS Bibliography

listing | bibtex
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
…
a export bibtex citation Loading...

Bookmark

BibSonomy logo Mendeley logo Reddit logo ScienceWISE logo

Bibliographic and Citation Tools

Bibliographic Explorer (What is the Explorer?)
Litmaps (What is Litmaps?)
scite Smart Citations (What are Smart Citations?)
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
  • About
  • Help
  • contact arXivClick here to contact arXiv Contact
  • subscribe to arXiv mailingsClick here to subscribe Subscribe
  • Copyright
  • Privacy Policy
  • Web Accessibility Assistance
  • arXiv Operational Status
    Get status notifications via email or slack