SZTAKI HLT | The Gutenberg Dialog Data Set for Neural Dialogue Models

The Gutenberg Dialog Data Set for Neural Dialogue Models

Richárd Csáky
Oct. 14, 2019, 9:30
SZTAKI, Kende Street, Great Council Hall

The dialogues are extracted from the online books of the Gutenberg Project, which can even produce multilingual data. I present a detailed analysis of data and errors, as well as some results. Then I would like to brain-storm about how to improve data quality and make effective use of multilingualism.