BME HLT | Seminar

Upcoming seminar

Past seminar

Visualizing and Understanding Recurrent Networks
March 2, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Gábor Borbély will present this paper on recurrent networks written by Andrej Karpathy, Justin Johnson and Li Fei-Fei.

See also:

https://arxiv.org/pdf/1506.02078v2.pdf

2 recent papers on Deep Learning
March 16, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Fiona Heaney and Matt Zielonko (Worcester Polytechnic Institute) will present two of the following 4 papers:

Interpreted Regular Tree Grammars for semantic parsing
March 30, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Gábor Recski will talk about an ongoing project on parsing various semantic graph structures. The Alto library implements IRTGs, which maps CFG rules to operations in any algebra. Alto comes with many algebra implementations, one of them is s-graph grammar, functionally equivalent to Hyperedge Replacement Grammars. Alto and s-grammars have previously been used for semantic parsing. We shall explore how they can be used to parse both dependency graphs and 4lang graphs, how one can be transformed into another, and how both 4lang-style definitions and basic 4lang operations such as expansion can be performed, all within the scope of s-graph grammars.

Training a Universal Word Embedding
April 13, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Eszter Iklódi will introduce her M.A. thesis project on training universal word embeddings. It is well known that the world is cut into different pieces by different languages: for example Hungarian fa means both the material (German Holz, English wood) and the plant (German Baum), while wood in English also means the area populated by the plant (Hungarian erdő, German Wald). A key question of multilingual information technology is to design a system that reflects these differences in a way that furthers semantic analysis e.g. for the Semantic Web. We train a vector-space model of language-independent concepts, based on an extension of the method presented in (Artetxe et al. 2016), using distributional models of 40+ languages. For evaluation and comparison, we build a (clustered) graph of words from 40+ languages, based on the method presented in (Youn et al. 2016) using data extracted from Wiktionary and other large-scale open-domain lexical databases.

Beyond RNN: multi-dimensional RNN, RNN transducers, RNN grammars
April 27, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Dávid Nemeskey gave a talk entitled "Beyond RNN: multi-dimensional RNN, RNN transducers, RNN grammars". The presentation slides are available here.

Autoencoder experiments on Hungarian words
May 11, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Judit Ács will present her experiments on Hungarian words using autoencoders.

Autoencoders are widely used for dimension reduction and compression. However, in NLP they are mostly applied at word-level features and character-level features are rarely exploited. We present a series of autoencoder and variational autoencoder experiments on Hungarian words using character unigrams. We extract character unigram features and add Gaussian noise in the case of variational autoencoders. We also add ‘realistic’ noise by randomly editing words up to one edit distance, in hope that the autoencoder will learn to perform similarly to a spell checker. Our manual error analysis gives insight into common Hungarian morphological phenomena which could be exploited for text compression. Our results suggest that Hungarian words can be dramatically compressed with little loss in accuracy. Our methods can be applied to other languages with relatively small alphabets.

Wasserstein GAN
June 1, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Coming soon: Gábor Borbély will present Wasserstein GANs.

See also:

https://arxiv.org/pdf/1701.07875v1.pdf

https://twitter.com/soumithchintala/status/827402236363812864

Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling
June 15, 2017, 8:15
MTA SZTAKI (Lágymányosi u. 11, Budapest) Room 306 or 506

Márton Makrai presents a paper by Gábor Berend (2016 TACL) Sparse Coding of Neural Word Embeddings for Multilingual Sequence Labeling.

From the abstract:

  • (near) state-of-the art performance for
    • both part-of speech tagging and named entity recognition
    • for a variety of languages
      • reasonable results for more than 40 treebanks for POS tagging,
  • model relies only on a few thousand sparse coding-derived features,
    • without applying any modification of the word representations employed for the different tasks. The proposed model has
    • favorable generalization properties