Czech and Polish verbal nouns (hereinafter VN) (Czech čtení, Polish czytanie ‘reading’) represent a highly productive and regularly formed group of nouns (Daneš 1987; Kolářová 2010; Pchelintseva 2016; Łaziński 2020; Kocková 2022; Shcherbii et al. 2024; to some irregulatities in vowel quantity in Czech see Zasina 2020). Since the derivational model «verb stem + –ije» is of Proto-Slavic origin, this type of deverbal nouns is present in all Slavic languages, including dialects. However, a comparative perspective reveals significant differences between VNs in terms of productivity and retention of verbal properties (cf., lexical meaning of the verb, aspectual characteristics). Studies of Slavic verbal nouns demonstrate that the productivity of VNs, as well as their retention of verbal properties, increases considerably "from east to west" (Dickey 2000; Pchelintseva 2016; Kocková 2022: 144). This paper presents a comparative corpus-based study of the productivity of VNs and explores the possibility of using a semi-automatic method to compare nominal and verbal lemmas obtained from language corpora.
In Czech and Polish, VNs exhibit a very high productivity and regularity in the formation; it is possible to form a VN from almost every verb and from both aspects of the verb. At the same time, some discrepancies can be observed between the frequency of verbs and verbal nouns (cf., cs nutkání ‘compulsion’ – nutkat ‘to compel’, bání se ‘fearing’ – bát se ‘to fear’). Many of VNs exist only as potential, or occasional units, and it is preferred to use another deverbal noun instead (cf. in Czech, low-frequent VN pracování ‘working’: 8 occurrences in the corpus syn2020 CNC; high-frequent deverbal noun práce ‘work’: 84 959 occurrences, cf.also Karlík 2004). Some authors suggest that the causative, modal, frequentative and state verbs in Czech (Havránek/Jedlička 1930; Křížková 1968: 134; Karlík 2004: 77), and verbs with relational or modal meaning, or impersonal meaning of the verb in Polish (Puzynina 1969, Marzniakowa 1993 etc.) do not form the VNs. However, the corpus data do not confirm these restrictions on productivity (for Czech cf., Giger/Kocková 2024).
In order to compare the frequency of verbs and VNs, we have developed a procedure for semi-automatic comparison of verb stems and stems of verbal nouns, which allows larger amounts of data to be processed. In this way, we have obtained a list of verbs that do not have a corresponding verbal noun and vice versa. The present paper relies on data from the Czech balanced corpus syn2020 CNC and on the Polish corpus NKJP, from which a list of the most frequent verbal lemmas and lemmas of verbal nouns was selected. The NKPJ corpus was utilised due to the availability of tools within this corpus that enabled the extraction of the required frequency list of verbs and corresponding verbal nouns. The stem was then extracted from both lists by automatically removing the suffixes. Based on the identification of regular changes in the formation of VNs (e.g., change in vocalic quantity and quality) automatic stem changes were made in each relevant group of VNs: e.g. in Czech, a noun stem ending in -á (očekává-ní ‘expectation’) was changed to -a (očekáva-). Subsequent comparison of the lists of lemma stems (e.g. očekáva-t ‘to expect’) demonstrates a high degree of agreement, yet also reveals a group of verbs that do not show corresponding verbal nouns (pl poszumieć ‘to make noise’- ?, cs připadat ‘to seem’ – ?) or occur with low frequency, cf., (pl potrzymać ‘to hold’– potrzymanie ‘holding’, cs smát se ‘to laught’ – smání ‘laughing’) and a group of nouns that synchronically are not associated with a corresponding verb (cs ponětí, pl pojęcie ‘concept, notion’). The lemmas obtained will also be examined in terms of aspect, semantics (incl. lexicalization) and other factors.
Daneš, F. (1987). Mluvnice češtiny 3. Syntax. Academia.
Dickey, S. M. (2000). Parameters of Slavic aspect: A cognitive approach. CSLI Publications.
Giger, M., & Kocková, J. (2024). Grenzüberschreitungen an der Peripherie: Aspektuelle Funktionen von Aktivpartizipien und Verbalsubstantiven im Tschechischen. Zeitschrift für Slawistik, 69(1), 1–26.
Havránek, B., & Jedlička, A. (1930). Česká mluvnice. SPN.
Institute of the Czech National Corpus. (2020). SYN2020: Representative corpus of written Czech. Faculty of Arts, Charles University. http://www.korpus.cz
Institute of the Czech National Corpus. (2021). SYN version 9. Faculty of Arts, Charles University. https://www.korpus.cz
Karlík, P. (2004). Mikrosyntax českých deverbálních jmen. Sborník prací Filozofické fakulty brněnské univerzity. A, Řada jazykovědná, 52(1), 71–81.
Kocková, J. (2022). Neurčité tvary slovesné v češtině, ruštině a němčině a jejich vzájemná ekvivalence. Academia; Institute of Slavonic Studies of the Czech Academy of Sciences.
Kolářová, V. (2010). Valence deverbativních substantiv v češtině (na materiálu substantiv s dativní valencí). Karolinum.
Křížková, H. (1968). Substantiva s dějovým významem v ruštině a v češtině. In Kapitoly ze srovnávací mluvnice ruské a české III. O ruském slovese (pp. 81–152). Academia.
Łaziński, M. (2020). Wykłady o aspekcie polskiego czasownika. Warsaw University Press.
Marzniakowa, I. (1993). Gramatyka konfrontatywna rosyjsko-polska. PWN.
National Corpus of Polish. (2008). NKJP. https://nkjp.pl/
Pchelintseva, E. (2016). Ot glagola k imeni: aspektualnost‘ v russkyx, ukraynskyx i pol’skyx imenax dejstvija. Nauka.
Puzynina, J. (1969). Nazwy czynności we wspőłczesnym języku polskim (słowotwőrstwo, semantyka, składnia). PWN SA.
Shcherbii, N., Vorobets, O., Mytsan, D., Korpalo, O., & Kuravska, N. (2024). The debate on the nature of verbal nouns in Slavic languages: Nominal or verbal? Eduweb, 18(2), 238–250.
Zasina, A. J. (2022). Dlouhá nebo krátká? Korpusová analýza verbálních substantiv na -ání a -aní. Korpus-gramatika-axiologie, (25), 72–87.