BME HLT | Training a Universal Word Embedding

Training a Universal Word Embedding

April 13, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Eszter Iklódi will introduce her M.A. thesis project on training universal word embeddings. It is well known that the world is cut into different pieces by different languages: for example Hungarian fa means both the material (German Holz, English wood) and the plant (German Baum), while wood in English also means the area populated by the plant (Hungarian erdő, German Wald). A key question of multilingual information technology is to design a system that reflects these differences in a way that furthers semantic analysis e.g. for the Semantic Web. We train a vector-space model of language-independent concepts, based on an extension of the method presented in (Artetxe et al. 2016), using distributional models of 40+ languages. For evaluation and comparison, we build a (clustered) graph of words from 40+ languages, based on the method presented in (Youn et al. 2016) using data extracted from Wiktionary and other large-scale open-domain lexical databases.