In this talk, I will present the data and methodology used for my ongoing Ph.D. thesis on evaluated representations of different groups of people in the Czech news press. Instead of searching the interface(s) of korpus.cz, I have extracted co-occurrences of nouns and adjectives based partly on previous research, partly on research design decisions. The data and methods for my thesis rely on work by inter alia ČNK members Václav Cvrček, Michal Křen, Jana Šindlerová and Adrian Jan Zasina, and the co-occurrences are extracted from the Journalistic subcorpus of SYN release 8.
The resulting work shows what can be done combining lists and lexica created for specific purposes, but also which caveats must be taken into account when using other researchers’ data. I will further try to explain my research design and the calculations used to create the different analyses.
References
Cvrček, V. (2014). “Proximita slov a možnosti jejího měření”. In Kvantitativní analýza kontextu (pp. 35–43). Nakladatelství Lidové Noviny/Ústav českého národní korpusu.
Piao, S., Rayson, P., Archer, D., Bianchi, F., Dayrell, C., El-Haj, M., Jiménez, R.-M., Knight, D., Křen, M., Löfberg, L., Nawab, R. M. A., Shafi, J., Teh, P. L., & Mudraya, O. (2016). “Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages.” In N. C. (Conference Chair), K. Choukri, T. Declerck, S. Goggi, M. Grobelnik, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016) (pp. 23–28). European Language Resources Association (ELRA).
Veselovská, K., Hajič, J., & Šindlerová, J. (2014). “Subjectivity Lexicon for Czech: Implementation and Improvements.” Journal for Language and Computational Linguistics, 29(1).
Zasina, A. J. (2019). “Gender-Specific Adjectives in Czech Newspapers and Magazines”. Jazykovedný Časopis, 70(2).