
Czech CLARIN Knowledge Centre for Corpus Linguistics


CLARIN is a digital research infrastructure for language resources and technology to support research mostly in the humanities and social sciences. It is a network with many participating centres from across Europe and beyond that offer language data, tools and knowledge how to work with them. CLARIN is committed to open science and FAIR data principles and provides multifaceted support for data sharing, reuse and interoperability. One of the cornerstones of the CLARIN infrastructure are K-centres (knowledge centres) that cover various areas of expertise and are ready to share their knowledge with the users.

About the K-centre

Czech CLARIN Knowledge Centre for Corpus Linguistics is based at the Institute of the Czech National Corpus, Faculty of Arts, Charles University, Prague. The Czech National Corpus (CNC) is an academic project founded in 1994 with the main aim to continuously map the Czech language by building and annotating a variety of large general-purpose corpora. Since 2012, CNC has been funded by the Ministry of Education, Youth and Sports within the framework of the Large Research Infrastructures programme. CNC is an associated member of the CLARIN-CZ consortium that is led by LINDAT/CLARIAH-CZ.

Apart from the language mapping, CNC also develops specialized web-based applications to provide a user-friendly access to the corpora and offers wide-ranging user support. The central access point to all the corpora and web applications is the CNC research portal. In addition to this service-oriented line of work, CNC is also a research centre that promotes an empirical approach to language and runs a PhD programme in Corpus Linguistics.

K-centre services

The K-Centre provides advice on all topics related to corpus linguistics or Czech language. Our corpus linguistics expertise includes data formats, annotation, metadata encoding, corpus querying, corpus linguistics methodology and statistical methods, but we can also provide external pointers to other centres regarding any aspect of Czech language including language resources and natural language processing.

We offer the following on-line services:

In addition, the following services can be arranged on demand via the helpdesk:

  • workshops and training events on various topics
  • provision of linguistic data in the form of corpus-derived packages while respecting the limitations that result from agreements with text providers, copyright law and other regulations
  • corpus hosting that includes technical processing, quality checks, and public access to the hosted corpus with related services

Please do not hesitate to contact us, we are ready to help you with your language-related requests!