Metka Bezlaj
University of Zadar / University of Zagreb, Croatia
Gorana Bikić-Carić
University of Zagreb, Croatia
Bojana Mikelenić
University of Zagreb, Croatia

The RomCro parallel corpus v.2.0. and its application in the contrastive analysis of infinitival and finite complement constructions

Keywords: parallel corpus; Romance languages; Croatian; collostructional analysis; complement constructions

This presentation introduces RomCro v.2.0, the latest version of a multilingual, multidirectional parallel corpus which includes contemporary literary texts in six Romance languages and Croatian and is available on platforms Sketch Engine and HR-CLARIN. Building on the foundation of RomCro v.1.0 (Bikić-Carić et al., 2023), which included Spanish, French, Italian, Portuguese, Romanian, and Croatian, the updated version expands the corpus with additional texts and a new language, Catalan. This expansion has increased the corpus to 19.4 million words. RomCro has been presented on various occasions and has proven to be a valuable resource for different types of contrastive analyses (e.g. on the use of articles in Romance languages (Bikić-Carić & Bezlaj, 2023) or noun determination (Bikić-Carić, 2020)) and students have successfully applied it in seminar and thesis work. 

In this study, we employ Construction Grammar framework (Boas, 2013) and corpus methodology to examine postverbal, non-prepositional infinitival and finite complements, using data from the RomCro. Due to space limitations and the time-intensive nature of manual annotation, our analysis focuses on Croatian and its comparison with French and Spanish. Our study proceeds in three steps. First, we conduct a quantitative analysis of Croatian verbs that potentially alternate between infinitival and finite complements to answer the following questions: to what extent do the same Croatian verbs exhibit attraction toward either construction and how can the strength of this association be quantified? Verb–complement pairings are treated as schematized constructions and analyzed using distinctive collexeme analysis, which identifies verbs most strongly associated with each type (Stefanowitsch & Gries, 2003; Gries & Stefanowitsch, 2004). Second, a qualitative analysis explores the question of how morphosyntactic and semantic factors such as the semantic class of the matrix and embedded verbs, verbal and lexical aspect, temporal reference or subject coreferentiality affect construction choice. These factors have been shown to play a crucial role in similar alternations (e.g., Yoon & Wulff, 2016; Kaleta, 2023). Third, a contrastive analysis uses French and Spanish as metalanguages to describe Croatian data. We ask how differences in construction choice are reflected in translation equivalents in the corpus and what strategies Croatian uses when lacking constructional options typical of Romance languages and vice versa.

We examine Croatian verbs that allow both complement types (illustrated in examples (1) and (2)). Target constructions were extracted from a 212,666-word Croatian subcorpus of original literary texts. Since no prior corpus-based research exists on this topic in Croatian, we did not preselect verbs but queried all main verbs using the CQL pattern tag=Vm.*. The results were filtered to include concordances with either the complementizer da or kako (lemma=da|kako) or infinitives (tag=Vmn.*), allowing up to two intervening elements. Searches were performed in Sketch Engine and results were sorted by lemma, manually cleaned to remove false positives and annotated for morphosyntactic and semantic features. We then retrieved the aligned French and Spanish translations of the validated Croatian examples. Translation equivalents were annotated for type of correspondence, leading to the formation of translation paradigms, defined as the set of equivalent constructions in target texts (Johansson, 2007:23), which served as a comparative basis for describing Croatian constructions.

Based on our preliminary research, we expect to find that verbs allowing both complement types show probabilistic distributions influenced by the aforementioned factors. The results highlight syntactic and semantic patterns in Croatian as a Slavic language, which remain underdescribed, while offering insights into French and Spanish. The parallel corpus is particularly valuable for identifying and interpreting these cross-linguistic patterns.

  1. Cr.  […] ako baš tko od poglavite gospode želi da odmah pripali cigaru [...].
  2. Cr. U ovom trenutku kao da ne želim naučiti više.

References

Bikić-Carić, G. (2020). Quelques particularités dans l’expression de la détermination du nom. Comparaison entre cinq langues romanes. Studia Universitatis Babes-Bolyai - Philologia, 65(4), 39–54.

Bikić-Carić, G., & Bezlaj, M. (2023). Neke specifičnosti upotrebe određenog člana u romanskim jezicima (s posebnim naglaskom na francuski i španjolski). In E. Spahić, I. Radeljković & L. Osmanović (Eds.), 70 godina Odsjeka za romanistiku Univerziteta u Sarajevu (pp. 15–27). Univerzitet u Sarajevu – Filozofski fakultet.

Bikić-Carić, G., Mikelenić, B., & Bezlaj, M. (2023). Construcción del RomCro, un corpus paralelo multilingüe. Procesamiento del Lenguaje Natural, 70, 99-110.

Boas, H.C. (2013). Cognitive Construction Grammar. In Th. Hoffmann & G. Trousdale (Eds.), The Oxford Handbook of Construction Grammar (pp. 233-252). Oxford University Press.

Gries, S. Th. & Stefanowitsch, A. (2004). Extending collostructional analysis: A corpus-based perspective on ‘alternations'. International journal of corpus linguistics, 9(1), 97-129.

Johansson, S. (2007). Seeing through Multilingual Corpora. John Benjamins Publishing Company. 

Kaleta, A. (2023). The semantics of clausal complementation: Evidence from Polish. Journal of Slavic Linguistics, 31(1), 99-132.

Stefanowitsch, A. & Gries, S. Th. (2003). Collostructions: Investigating the interaction of words and constructions. International journal of corpus linguistics, 8(2), 209-243.

Yoon, J. & Wulff, S. (2016). A corpus-based study of infinitival and sentential complement constructions in Spanish. In J. Yoon & S. Th. Gries (Eds.), Corpus-Based Approaches to Construction Grammar (pp. 145-164). John Benjamins.