SZTAKI HLT | Do multi-sense word embeddings learn more senses?

Do multi-sense word embeddings learn more senses?

Márton Makrai, Veronika Lipp
In K + K = 120, 2017

Slides (PDF)

Proceedings (Festschrift)

Multi-sense embeddings for Hungarian in 600 dimensions, trained on the deglutinized version of the Hungarian National corpus


Note that the English and the Hungarian variants of this page is not a strictly parallel corpus.

At previous year's ACL, we (Borbély et al 2016) proposed a method for measuring the sense resolution of multi-sense embeddings (MSE) based on linear translation (Mikolov et al 2013) from the MSE to a plain embedding. The talk develops this method.

In the experiments reported in this talk two measures have been used. Meta-parameters (e.g. the target embedding) have been chosen based on the metric called lax: at least one sense vector should have a good translation. This measure does not punish different senses with the same translation.

To measure whether the different sense vectors really correspond to different senses, we take a slightly stricter measure: the sets of the good (gold) translations of different sense vectors should be different. The proportion of such source forms is computed among the words predicted to be ambiguous.

lax disamb
AdaGram 73.3%18.53%
mutli “sense vectors” 71.0%19.46%
mutli “context vectors”69.9%20.76%

We found that there is a trade-off between the two measures, which could be interpreted as that the more specific a vector is, the easier it is to translate, but if the vectors are too specific, then the translations may coincide.

Future work: analysis of the number of word senses plotted against word frequency. Word ambiguity as a Dirichlet Process.