AplikaceAplikace

Comparing quantitative morphological features of languages: a study on annotated multi-parallel texts

Datum
Přednášející
  1. Vojtěch John
Abstrakt

Research on morphological diversity in typology and contrastive linguistics has traditionally focused on discrete, predominantly inflectional features. However, corpus-based approaches can provide complementary insights into the quantitative and dynamic aspects of morphological systems. While multiple languages have both morphological resources and large parallel corpora, sizeable corpora with detailed morphological annotation - including morphological segmentation and morpheme classification - remain very scarce. As part of a broader effort to address this gap, we present our current work on the detailed automatic annotation of part of the multiparallel corpus Europarl, comprising over 10 million tokens in each of six languages: Czech, English, French, German, Hungarian, and Slovak. The presentation reports preliminary results on quantitative morphological features extracted from these data and their potential to inform further cross-linguistic research. In particular, we discuss observed cross-linguistic regularities in morpheme frequency distributions, relationships among morpheme classes, and their possible connection to word formation strategies.