<title>vision on Kyle Roth</title>

<link>https://kylrth.com/tags/vision/</link>

<description>Recent content in vision on Kyle Roth</description>

<generator>Hugo -- gohugo.io</generator>

<language>en-us</language>

<lastBuildDate>Fri, 11 Feb 2022 14:18:30 -0500</lastBuildDate>

<atom:link href="https://kylrth.com/tags/vision/index.xml" rel="self" type="application/rss+xml"/>

<item>

<title>Masked autoencoders are scalable vision learners</title>

<link>https://kylrth.com/paper/masked-autoencoders-are-scalable-vision-learners/</link>

<pubDate>Fri, 11 Feb 2022 14:18:30 -0500</pubDate>

<guid>https://kylrth.com/paper/masked-autoencoders-are-scalable-vision-learners/</guid>

<description>This post was created as an assignment in Irina Rish&rsquo;s neural scaling laws course (IFT6167) in winter 2022. The post contains no summarization, only questions and thoughts. In this paper they mention that the mask vector is learned, and it sounds like the positional embeddings are also learned. I remember in Attention is all you need they found that cosine positional embeddings worked better than learned ones, especially for sequences of longer length.</description>

...

</item>

...