Decomposition of a normalized PPMI tensor for transitive verb structure analysis
Tensors are generalizations of matrices: just as matrices contain numbers arranged along two axes (rows and columns), tensors have multiple axes (or modes). The different tensor decompositions are generalizations of matrix singular value decomposition and have similar purposes: latent meaning modeling, noise reduction, modeling higher-order co-occurrences (i.e., when two words appear in similar contexts), or data sparsity reduction. In our experiments, we decompose tensors populated by different measures of association between subjects, verbs, and objects. Among the tested association measures, the normalized pointwise mutual information features best, which to our knowledge has not yet been used in the three-variable case. We first model the similarity of subject-verb-bed triplets, and then investigate the latent dimension of the non-negative Tucker decomposition to discover semantosyntactic verb classes (Levin 1993).