BME HLT | Szeminárium

Következő szemináriumok

Régebbi szemináriumok

Rekurrens neurális hálók megértése és vizualizációja
2017. március 2., 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) 306. vagy 506. terem

Borbély Gábor egy a rekurrens hálókról szóló cikket mutat be: cikk itt

2017. március 16., 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) 306. vagy 506. terem

Fiona Heaney és Matt Zielonko (Worcester Polytechnic Institute) az alábbi négy cikk közül mutat be kettőt:

Interpreted Regular Tree Grammars for semantic parsing
2017. március 30., 8:15
Helyszín: MTA SZTAKI (Lágymányosi u. 11, Budapest) 306. vagy 506. terem

Gábor Recski will talk about an ongoing project on parsing various semantic graph structures. The Alto library implements IRTGs, which maps CFG rules to operations in any algebra. Alto comes with many algebra implementations, one of them is s-graph grammar, functionally equivalent to Hyperedge Replacement Grammars. Alto and s-grammars have previously been used for semantic parsing. We shall explore how they can be used to parse both dependency graphs and 4lang graphs, how one can be transformed into another, and how both 4lang-style definitions and basic 4lang operations such as expansion can be performed, all within the scope of s-graph grammars.

Training a Universal Word Embedding
2017. április 13., 8:15
Helyszín: MTA SZTAKI (Lágymányosi u. 11, Budapest) 306. vagy 506. terem

Eszter Iklódi will introduce her M.A. thesis project on training universal word embeddings. It is well known that the world is cut into different pieces by different languages: for example Hungarian fa means both the material (German Holz, English wood) and the plant (German Baum), while wood in English also means the area populated by the plant (Hungarian erdő, German Wald). A key question of multilingual information technology is to design a system that reflects these differences in a way that furthers semantic analysis e.g. for the Semantic Web. We train a vector-space model of language-independent concepts, based on an extension of the method presented in (Artetxe et al. 2016), using distributional models of 40+ languages. For evaluation and comparison, we build a (clustered) graph of words from 40+ languages, based on the method presented in (Youn et al. 2016) using data extracted from Wiktionary and other large-scale open-domain lexical databases.

2017. április 27., 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) 306. vagy 506. terem

Dávid Nemeskey gave a talk entitled "Beyond RNN: multi-dimensional RNN, RNN transducers, RNN grammars". The presentation slides are available here.

Autoencoder experiments on Hungarian words
2017. május 11., 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) 306. vagy 506. terem

Judit Ács will present her experiments on Hungarian words using autoencoders.

Autoencoders are widely used for dimension reduction and compression. However, in NLP they are mostly applied at word-level features and character-level features are rarely exploited. We present a series of autoencoder and variational autoencoder experiments on Hungarian words using character unigrams. We extract character unigram features and add Gaussian noise in the case of variational autoencoders. We also add ‘realistic’ noise by randomly editing words up to one edit distance, in hope that the autoencoder will learn to perform similarly to a spell checker. Our manual error analysis gives insight into common Hungarian morphological phenomena which could be exploited for text compression. Our results suggest that Hungarian words can be dramatically compressed with little loss in accuracy. Our methods can be applied to other languages with relatively small alphabets.

Wasserstein GAN
2017. június 1., 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) 306. vagy 506. terem

Hamarosan: Borbély Gábor a Wasserstein GAN-okról fog beszélni.

Lásd még:

2017. június 15., 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Márton Makrai presents a paper by Gábor Berend (2016 TACL) Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling.

From the abstract:

  • (near) state-of-the art performance for
    • both part-of speech tagging and named entity recognition
    • for a variety of languages
      • reasonable results for more than 40 treebanks for POS tagging,
  • model relies only on a few thousand sparse coding-derived features,
    • without applying any modification of the word representations employed for the different tasks. The proposed model has
    • favorable generalization properties

2017. szeptember 7., 8:15
MTA SZTAKI, Lágymányosi u. 11, Rm 306 or 506

On September 7, Richárd Csáky (BME) will give a talk on current approaches to building conversational agents (chatbots).

Besides a short abstract, see also Ricsi's longer summary of the topic, and his summary of 70+ recent papers on chatbots and the seq2seq architecture

Stanford’s Graph-based Neural Dependency Parser
2017. szeptember 14., 8:15
MTA SZTAKI, Lágymányosi u. 11, Rm 306 or 506

On the 14th of September, Gábor Recski will require our help to fully understand the system that won the CoNLL-2017 shared task on Universal Dependency parsing. The system is presented in (Dozat et. al 2017b) and is mostly based on the system described in (Dozat et. al 2017b), which is in turn an extension of a system presented in the TACL paper by (Kiperwasser & Goldberg 2016). All help is appreciated!

Recurrent dropout
2017. szeptember 21., 8:15
MTA SZTAKI, Lágymányosi u. 11, Rm 306 or 506

Dropout has been the most popular choice of normalization in RNN language models since Zaremba (2014). However, early attempts at applying it to the recurrent connections (as opposed to connections between layers) were unsuccessful. In the seminar, we are going to review three recent papers that managed to crack the nut: Gal and Ghahramani (2016), Semeniuta et al. (2016) and Moon et al. (2015). We shall also discuss Zoneout, a related concept introduced in Krueger et al. (2017).