AppsApps
Word at a Glance

What is a corpus?

A language corpus is an electronic collection of authentic texts (written or spoken) easily searchable for various language phenomena (esp. words and collocations) and to display them in their natural context.

The CNC corpora include written contemporary Czech (more than 4 billion tokens), spontaneous spoken language (more than 7 million tokens), diachronic corpus of historical texts and parallel corpus InterCorp that contains translations from or to 30+ languages.

Applications

  1. KonText

    The KonText application is a basic query interface for working with corpora. It allows evaluation of simple and complex queries, displaying their results as concordance lines, computing frequency distribution, calculating association measures for collocations and further work with language data. All functions are clearly described in the manual.

  2. SyD

    The SyD application is designed for versatile exploration of variants both from the synchronic (contemporary language) and diachronic perspective. Based on the CNC corpora, it summarizes how frequently each particular variant is used in present as well as in the past. Try it out! Simply enter two or more competing variants of a single linguistic phenomenon, e.g. téměř × skoro.

  3. Morfio

    The Morfio application is aimed at searching word formation relations between corpus units, e.g. lovit - úlovek. It enables finding all word pairs formed in the same way and evaluating morphological productivity of their formation. The application is based on large corpora of written language that cover a large variety of word formation possibilities of contemporary Czech.

  4. KWords

    The KWords application provides a fundamental basis for empirical interpretation of texts. It analyzes words in the given text and compares their frequency with the reference corpus. The result is the identification of keywords, i.e. units occurring significantly more often in the analyzed text than in the reference corpus representing a neutral language use.

  5. Treq

    The Treq is an easy-to-use application to look up translation equivalents in bidirectional Czech-foreign language dictionaries automatically extracted from parallel texts in the InterCorp corpus.

Who are we?

Logo ČNK

The Czech National Corpus is an academic project founded in 1994 at the CU FA and administered by the Institute of the Czech National Corpus. The aim of the project is systematic mapping of Czech and other languages in comparison with Czech. CNC corpora are accessible to everybody interested in studying the language after free registration.

Support and information resources

  1. Wiki

    The CNC web manual in the form of a wiki is a complex corpus linguistics knowledge base. It also contains useful information about the CNC tools and resources, and an on-line tutorial in seven lessons aimed at both beginners and advanced users (Czech only).

  2. Support

    The support centre is a virtual platform accessible to all registered users. It features an advisory centre (with Q&A) and application-related issue tracking for bug reports and feature requests.

  3. Biblio

    Biblio is a repository of CNC-based research papers, books and theses. The repository is publicly available to all visitors of this portal and, at the same time, it serves as a continuously updated corpus linguistics bibliography. Would you like to know more?

  4. Advisory Board

    The Advisory Board is a permanent body of the Czech National Corpus research infrastructure. It monitors the scientific quality of the project, provides feedback concerning short-term and long-term strategy decisions and evaluates the project results.

  5. Language data

    Is access via the query interface insufficient for your research objectives? CNC also provides linguistic data in packages derived from the published corpora while respecting the limitations that result from agreements with text providers, copyright law and other regulations.

  6. For schools

    We are introducing a new repository of corpus-based exercises for language teaching at primary and secondary schools. This regularly updated webpage offers both a variety of worksheets ready to be printed out and handed to the students and tips for the hands-on use of corpora in language learning environment (Czech only).

  7. CLARIN K-centre

    The CNC-based K-centre provides information, consulting and technical assistance in the area of corpus linguistics with specialization in empirical research of Czech. It is a part of the K-centres of CLARIN, an ESFRI infrastructure focusing on digital language resources and tools for Humanities and Social Sciences.