BME HLT | emLam – a Hungarian Language Modeling baseline

emLam – a Hungarian Language Modeling baseline

2017

Paper (PDF)

This paper aims to make up for the lack of documented baselines for Hungarian language modeling. Various approaches are evaluated on three pub- licly available Hungarian corpora. Perplexity values comparable to models of similar-sized English corpora are reported. A new, freely downloadable Hungar- ian benchmark corpus is introduced.

Citation
@inproceedings{Nemeskey:2017,
  author = {Nemeskey, Dávid Márk},
  title = {\texttt{emMorph} a Hungarian Language Modeling baseline},
  booktitle = {{XIII}.\ Magyar Sz{\'a}m{\'\i}t{\'o}g{\'e}pes Nyelv{\'e}szeti
    Konferencia ({MSZNY}2017)},
  year = 2017,
  pages = {91--102},
  address = {Szeged},
}