https://kylrth.com/tags/vision/Recent content in vision on Kyle RothHugo -- gohugo.ioen-usFri, 11 Feb 2022 14:18:30 -0500https://kylrth.com/paper/masked-autoencoders-are-scalable-vision-learners/Fri, 11 Feb 2022 14:18:30 -0500https://kylrth.com/paper/masked-autoencoders-are-scalable-vision-learners/This post was created as an assignment in Irina Rish’s neural scaling laws course (IFT6167) in winter 2022. The post contains no summarization, only questions and thoughts. In this paper they mention that the mask vector is learned, and it sounds like the positional embeddings are also learned. I remember in Attention is all you need they found that cosine positional embeddings worked better than learned ones, especially for sequences of longer length.
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0">
<channel>
<title>vision on Kyle Roth</title>
<link>https://kylrth.com/tags/vision/</link>
<description>Recent content in vision on Kyle Roth</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Fri, 11 Feb 2022 14:18:30 -0500</lastBuildDate>
<atom:link href="https://kylrth.com/tags/vision/index.xml" rel="self" type="application/rss+xml"/>
<item>
<title>Masked autoencoders are scalable vision learners</title>
<link>https://kylrth.com/paper/masked-autoencoders-are-scalable-vision-learners/</link>
<pubDate>Fri, 11 Feb 2022 14:18:30 -0500</pubDate>
<guid>https://kylrth.com/paper/masked-autoencoders-are-scalable-vision-learners/</guid>
<description>This post was created as an assignment in Irina Rish&rsquo;s neural scaling laws course (IFT6167) in winter 2022. The post contains no summarization, only questions and thoughts. In this paper they mention that the mask vector is learned, and it sounds like the positional embeddings are also learned. I remember in Attention is all you need they found that cosine positional embeddings worked better than learned ones, especially for sequences of longer length.</description>
</item>
</channel>
</rss>