SZTAKI HLT | Classifying the Hungarian Web

Classifying the Hungarian Web

AndrĂ¡s Kornai, Marc Krellenstein, Michael Mulligan, David Twomey, Fruzsina Veress, Alec Wysoker
In Proceedings of the EACL, 2003

Link
PDF

In this paper we present some lessons learned from building VIZSLA, the keyword search and topic classification system used on the largest Hungarian portal, [origo.hu]. Based on a simple statistical language model, and the large-scale supporting evidence from vizsla, we argue that in topic classification only positive evidence matters.

Citation
@inproceedings{Kornai:2003,
    author = {Andr\'as Kornai and Marc Krellenstein and Michael Mulligan and David Twomey and Fruzsina Veress and Alec Wysoker},
    editor = {Ann Copestake and Jan Hajic},
    title = {Classifying the {H}ungarian {W}eb},
    booktitle = {Proceedings of the EACL},
    pages = {203-210},
    year = {2003}
}