Evaluating multilingual language models
Contextualized language models such as BERT have changed the NLP landscape in the last few years. BERT and some of its contemporaries have massively multilingual versions with support for over 100 languages. Probing is a simple but popular method for evaluating the linguistic content of such models. We perform a systematic evaluation in dozens of languages across multiple probing tasks.
The first part of this talk describes a large scale morphological evaluation of the multilingual BERT in 40 languages. Aside from raw evaluation, we perturb the input in a way that removes parts of the information and we analyze the change in BERT's linguistic behavior. We show that linguistic typology can be recovered to some degree through these methods.
The second part deals with the tokenization of these contextualized language models. All models use some kind of subword tokenization with a fixed subword vocabulary. Token-level usage of such models such as named entity recognition requires a way of pooling multiple subwords that correspond to a single token. We show that the choice of subword pooling method often makes a large difference and that there is no one size fits all when it comes to subword pooling.
The third part of this talk focuses on the use of contextualized models for Hungarian. We compare 4 multilingual models against two Hungarian models, HuBERT and HILBERT on three Hungarian tasks, morphological probing, POS tagging and NER.