<title>embedding on Kyle Roth</title>

<link>https://kylrth.com/tags/embedding/</link>

<description>Recent content in embedding on Kyle Roth</description>

<generator>Hugo -- gohugo.io</generator>

<language>en-us</language>

<lastBuildDate>Fri, 11 Dec 2020 06:30:43 -0700</lastBuildDate>

<atom:link href="https://kylrth.com/tags/embedding/index.xml" rel="self" type="application/rss+xml"/>

<item>

<title>Cross-lingual alignment of contextual word embeddings, with applications to zero-shot dependency parsing</title>

<link>https://kylrth.com/paper/cross-lingual-alignment-contextual/</link>

<pubDate>Fri, 11 Dec 2020 06:30:43 -0700</pubDate>

<guid>https://kylrth.com/paper/cross-lingual-alignment-contextual/</guid>

<description>Recent contextual word embeddings (e.g. ELMo) have shown to be much better than &ldquo;static&rdquo; embeddings (where there&rsquo;s a one-to-one mapping from token to representation). This paper is exciting because they were able to create a multi-lingual embedding space that used contextual word embeddings. Each token will have a &ldquo;point cloud&rdquo; of embedding values, one point for each context containing the token. They define the embedding anchor as the average of all those points for a particular token.</description>

...

</item>

<item>

<title>Deep contextualized word representations</title>

<link>https://kylrth.com/paper/deep-contextualized-word-representations/</link>

<pubDate>Thu, 03 Dec 2020 12:01:43 -0700</pubDate>

<guid>https://kylrth.com/paper/deep-contextualized-word-representations/</guid>

<description>This is the original paper introducing Embeddings from Language Models (ELMo). Unlike most widely used word embeddings, ELMo word representations are functions of the entire input sentence. That&rsquo;s what makes ELMo great: they&rsquo;re contextualized word representations, meaning that they can express multiple possible senses of the same word. Specifically, ELMo representations are a learned linear combination of all layers of an LSTM encoding. The LSTM undergoes general semi-supervised pretraining, but the linear combination is learned specific to the task.</description>

...

</item>

...