Corpus-based analyses of variation in English: Why both size and structure matter (registrace)

  1. Mark Davies

čas a místo konání: pondělí 12.4.2016 v 18h, místnost 104 na FF UK

English corpus linguistics has a tradition of using small (1-5 million word) corpora to look at variation for high frequency phenomena. Within the last 5-10 years, however, very large web-based corpora (like those from Sketch Engine) have also become available. While both of these types of corpora certainly have their advantages, I argue that both have serious weaknesses when it comes to looking at many types of variation in English.

I will present many examples of lexical, morphological, syntactic, and semantic variation in English, which can only be studied using corpora that are both large and which have a structure that lends itself to looking at variation (rather than just as a “blob” of billions of words of web pages).

These examples of genre-based, historical, and dialectal variation in English will come from the 520 million word Corpus of Contemporary American English (COCA), the 400 million word Corpus of Historical American English (COHA), and the 1.9 billion word Corpus of Global Web-based English (GloWbE). All of these corpora are much larger than comparable corpora of English, and their unique structure allows them to provide insight into variation in English that cannot be obtained with any other source.