These are my notes from research papers I read. Each page’s title is also a link to the abstract or PDF.

This was mentioned in the paper on Google’s Multilingual Neural Machine Translation System. It’s regarded as the original paper to use the word-piece model, which is the focus of my notes here. the WordPieceModel Here’s the WordPieceModel algorithm: func WordPieceModel(D, chars, n, threshold) -> inventory: # D: training data # n: user-specified number of word units (often 200k) # chars: unicode characters used in the language (e.g. Kanji, Hiragana, Katakana, ASCII for Japanese) # threshold: stopping criterion for likelihood increase # inventory: the set of word units created by the model inventory := chars likelihood := +INF while len(inventory) < n && likelihood >= threshold: lm := LM(inventory, D) inventory += argmax_{combined word unit}(lm.

Read more

They used a technique called CCA to combine hand-made features with NN representations. It didn’t do great on typological feature prediction, but it did do well with predicting a phylogenetic tree for Indo-European languages.

These guys made sure to model allophones. They had an encoder that produced a universal phone set, and then language-specific decoders. This meant they could use data from various languages to train the system. The decoder has an allophone layer, followed by other dense trainable layers. The allophone layer is a single trainable dense layer, but was initialized by a bunch of linguists who sat down and described the phone sets belonging to each phoneme in each language present in the training set.

Read more