Information theory has shown that the distribution of forms is critical to the design of efficient communication systems. In particular, it has been shown that geometric (and exponential) distributions are especially useful in the design of efficient communication systems, both because they are optimal for coding purposes and because they are memoryless.
In the first part of this talk, I will describe some recent finding showing that Sinosphere family names are exponentially distributed, and reveal that historically the name distributions of English that correspond appropriately to them were also exponential, such that the distributional structure of names was, at one point at least, universal across the world’s major languages. I will then describe how these name distributions appear to have optimized meaningful communication about individuals, and show that despite the fact that the aggregated name distributions of modern English speaking countries are Zipf-distributed, the empirical name distributions that speakers actually encounter in these communities also have an exponential form. I will further show how the growth in information in the distribution of names in these communities closely reflects the communicative constraints upon them, suggesting that name systems are far from random or arbitrary, but rather appear to form self-organizing communication systems.
In the second half of the talk I will describe a set of analyses that reveal how the empirical distributions of the other classes of lexical forms that speakers engage with in moment to moment communication in English are also exponential – a result that suggests that the Zipfian distributions long thought to play a functional role in language are actually an artifact of mixing empirical distributions – as well as describing how these structures serve facilitate the discriminative processes of human communication.