BME HLT | emLam – a Hungarian Language Modeling baseline
Dávid Márk Nemeskey
emLam – a Hungarian Language Modeling baseline
In XIII. Magyar Számítógépes Nyelvészeti Konferencia (MSZNY 2017), 2017

Paper (PDF)

This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three publicly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungarian benchmark corpus is introduced.

Citation
@inproceedings{Nemeskey:2017,
  author = {Nemeskey, Dávid Márk},
  title = {\texttt{emMorph} a Hungarian Language Modeling baseline},
  booktitle = {{XIII}.\ Magyar Sz{\'a}m{\'\i}t{\'o}g{\'e}pes Nyelv{\'e}szeti
    Konferencia ({MSZNY}2017)},
  year = 2017,
  pages = {91--102},
  address = {Szeged},
}