
How does the word length of Chinese words change? A diachronic analysis based on Google Ngrams

  1. Xinying Chen

For a long time, diachronic studies of languages, probably widely as known as historical linguistic studies or language evolution studies, are mainly focusing on two aspects: constructing language evolution models and demonstrating different hypothesis by using some small language samples. The insurmountable obstacle of collecting and analyzing authentic diachronic data made the absence of quantitative investigation and hypothesis verification studies based on big data. The situation only has been changed recently due to the advancement of technologies such as OCR, computer memory, text mining, etc. Now, it is possible but still difficult to do a diachronic study by analyzing authentic language data. Our analysis is focusing on the Chinese. By analyzing the Google 1-gram data, we want to describe the Chinese word length changes between 1900-1999.