Can the corpus tell us how to periodize the history of a language?
How do we know when, say, Early Modern period of a given language expires and Late Modern commences? Typically coarse-grained periodizations are based on changes of the grammatical system, whereas fine-grained ones take as an evidence some sociolinguistic or philological arguments. Instead we propose a corpus driven approach. Using text categorisation methods, in a stepwise fashion we divide a diachronic corpus into two, as different as possible, subcorpora (Eder & Górski 2016). This allows us for identification of quantitatively different stages in language development. The underlying assumption is that effective categorisation is possible only if two requirements are satisfied: there is a true difference (be it lexical or grammatical) between older and newer texts and the two subcorpora are homogeneous.