This application provides a fast and basic overview of how the given word is used. It contains all of the information which can be obtained from the CNC corpora using the available tools, and creates an accessible word profile from various perspectives.
The results displayed in the application, are obtained from the CNC resources: information on written language originates from the SYN2015 corpus, which reflects contemporary written Czech language in its various forms (fiction, journalism, scientific literature and popular science), information on spoken Czech is derived from the ORAL version 1 corpus, which comprises transcriptions of informal spontaneous speech recordings.
Apart from the word profile for Czech words (the Search for word module), the application also offers information regarding its possible translations into other languages (the Search in two languages module); this tool uses the parallel multi-language corpus InterCorp, which contains texts and their translations from or into over 30 languages.
The results of the analysis for a given word are organized into individual tiles. Each tile contains information on the source from which the data originates, and a link leading to one of the tools used for working with CNC corpora, where you have access to more detailed information.
Each tile contains some basic tips and advice in its header, in some cases the data in the tile can be viewed in the form of a table , and the results can be adjusted using additional parameters .
All of the displayed data is obtained via automatic analysis – the accuracy and reliability of the frequency information depends on the reliability of the annotation in the corpora. In this respect, the key role is played by lemmatization (i.e. the allocation of a basic word form) and morphological tagging, whose accuracy cannot reach 100% even with the utilization of the most modern tools. In order to obtain a detailed and reliable evaluation of the results, it is necessary to verify the results in the sources, especially with regard to annotation adequacy.
Please note that this is a development version of the application which means some functions and visuals may change without prior notice.
Also, short outages may occur from time to time due to code/configuration upgrades.
Word at a Glance serves as a portal into the world of language corpora. At present, these collections of texts are the primary source of information about natural language and are used in all fields of linguistic research. Further information on language corpora and corpus tools can be found on www.korpus.cz, including:
In order to gain full access to these tools and resources users must first complete the free registration.