Do multi-sense word embeddings learn more senses?
Márton Makrai, Veronika Lipp
Slides (PDF)
Multi-sense embeddings for Hungarian in 600 dimensions, trained on the deglutinized version of the Hungarian National corpus
Note that the English and the Hungarian variants of this page is not a strictly parallel corpus.
At previous year's ACL, we (Borbély et al 2016) proposed a method for measuring the sense resolution of multi-sense embeddings (MSE) based on linear translation (Mikolov et al 2013) from the MSE to a plain embedding. The talk develops this method.
In the experiments reported in this talk two measures have been used. Meta-parameters (e.g. the target embedding) have been chosen based on the metric called lax: at least one sense vector should have a good translation. This measure does not punish different senses with the same translation.
To measure whether the different sense vectors really correspond to different senses, we take a slightly stricter measure: the sets of the good (gold) translations of different sense vectors should be different. The proportion of such source forms is computed among the words predicted to be ambiguous.
lax | disamb | AdaGram | 73.3% | 18.53% | mutli “sense vectors” | 71.0% | 19.46% | mutli “context vectors” | 69.9% | 20.76% |
We found that there is a trade-off between the two measures, which could be interpreted as that the more specific a vector is, the easier it is to translate, but if the vectors are too specific, then the translations may coincide.
Future work: analysis of the number of word senses plotted against word frequency. Word ambiguity as a Dirichlet Process.