AppsApps

Multi-Dimensional Analysis of Czech

What is multi-dimensional analysis?

Multi-dimensional analysis (MDA) is a method developed by corpus linguist Douglas Biber used for empirical research of text variation. The aim of MDA is to capture the variation based on the function that variant language features have in texts. In contrast to earlier approaches, the goal of MDA is not the a priori identification of linguistic features that are typical of a particular communication domain; MDA, on the contrary, uses the co-occurrence of linguistic features as the starting point for interpretation. From the features that co-occur frequently in texts, it is then possible to infer what function these features collectively fulfill.

What is the procedure of MDA?

MDA has been used as a research method for modeling register variation of many languages. The research procedure consists of the following steps:

  • corpus compilation,
  • feature selection and retrieval from the corpus (operationalization),
  • statistical evaluation using factor analysis,
  • interpretation of results.

In addition to describing language variation, MDA results can be used to determine the main registers in a given language (see register classification, which functions as a complement to txtype/genre classification).

Multi-dimensional model of Czech

Based on the analysis of the Koditex corpus, a model with 8 dimensions was created:

  1. dynamic (+) vs. static (-),
  2. spontaneous (+) vs. prepared (-),
  3. higher (+) vs. lower (-) level of cohesion,
  4. polythematic (+) vs. monothematic (-),
  5. higher (+) vs. lower (-) amount of addressee coding,
  6. general/intension (+) vs. particular/extension (-),
  7. prospective (+) vs. retrospective (-),
  8. attitudinal (+) vs. factual (-).

The naming of the dimensions is primarily based on information about which linguistic features contribute most to their establishment (see the inventory of prominent features), and on the position of texts within a given dimension (see the MDAvis tool).

Team members

Václav Cvrček
Václav Cvrček
Zuzana Laubeová
Zuzana Laubeová
David Lukeš
David Lukeš
Petra Poukarová
Petra Poukarová
Anna Řehořková
Anna Řehořková
Adrian Jan Zasina
Adrian Jan Zasina

Key publications of the project (description of the Czech MDA)

Learn more about the data

Tool for viewing MDA results

Koditex Corpus Description

Data

  • Cvrček, V. et al., 2018, Multi-Dimensional Analysis of Czech (Original data for a general-purpose multi-dimensional analysis model of register variation in Czech). https://doi.org/10.18710/QAJKZW, The Tromsø Repository of Language and Linguistics (TROLLing).
  • Lukeš, D. 2018, Tidiness: A measure based on information theory to help with selecting an appropriate number of dimensions to extract in MDA. Accessible on-line at https://github.com/czcorpus/mda.

Publications based on the project results

Grant support

Czech MDA was conducted at Charles University by researchers from the Institute of the Czech National Corpus; it was supported from the ERDF project Language Variation in the CNC no. CZ.02.1.01/0.0/0.0/16_013/0001758.

EU, MŠMT