SZTAKI HLT | HunTag

HunTag

GitHub

a sequential tagger for NLP combining the linear classificator Liblinear and Hidden Markov Models Based on training data, Huntag can perform any kind of sequential sentence tagging and has been used for NP chunking and Named Entity Recognition.

Download and documentation

HunTag is hosted on GitHub. Documentation is included in the repository.

Models

Pre-trained models are available for Hungarian NP-chunking and Named Entity Recognition. Both have been trained on the Szeged Treebank. Download them from the links below:

Hungarian NP chunker (tgz, 181M)

Hungarian NER tagger (tgz, 131M)

Authors

Huntag was created by Gábor Recski and Dániel Varga. It is a reimplementation and generalization of a Named Entity Recognizer built by Dániel Varga and Eszter Simon.

License

Huntag is made available under the GNU Lesser General Public License v3.0. If you received Huntag in a package that also contain the Hungarian training corpora for Named Entity Recoginition and chunking task, then please note that these corpora are derivative works based on the Szeged Treebank, and they are made available under the same restrictions that apply to the original Szeged Treebank.

Reference

If you use the tool, please cite the following paper:

Gábor Recski, Dániel Varga (2009): A Hungarian NP-chunker. In: The Odd Yearbook

If you use some specialized version for Hungarian, please also cite the following paper:

Dóra Csendes, János Csirik, Tibor Gyimóthy and András Kocsor The Szeged Treebank Text, Speech and Dialogue, Lecture Notes in Computer Science, Volume 3658/2005, (2005)

Owner
Authors