Explaining how speakers align their expectations is central to understanding human vocal communication. Spontaneous articulations are produced by dynamic and kinematic processes that guarantee a considerable amount of deviation in the way speech sounds and sequences are produced, such that many of the gestures speakers produce in spontaneous conversations are actually unintelligible in isolation (Pollack and Pickett, 1963; Bard and Anderson, 1983; Ernestus et al., 2002). In spite of the mess of noisy physical properties that speech signals comprise, people invariably tend to experience them as speech sounds, words and phrases. This mismatch between the measurable attributes of the signal and their appearance to a receiver poses a deep puzzle: How do we bring order to this apparent chaos?
In my work, I approach this question as a probabilistic puzzle rooted in information theory (Shannon, 1948), seeking to resolve an apparent contradiction between the formal definition of information, and the specific kind of periodic structure its implementations require on one hand, and the apparent absence of this structure in the signal produced by speakers on the other. Crucially, applying the notion of information to speech embraces the idea that communication is a predictive process made possible because the speech stream contains regularities that are predictable across speakers. However, an obvious difference between speech (and natural language) and other information-theoretic codes is that the former has to be learned, and this means that speakers' experience of codes -- and their internal models of them -- will inevitably vary. This makes the whole idea of predictable regularities in speech signals seem problematic: How do speakers ever manage to learn to align their individual expectations about what spoken signals should sound like, and thus make any regularities in them mutually predictable and informative in the first place?
In this talk I will address the sources of information (signals) that allow speakers to co-ordinate their expectations and communicate successfully. I show that regular patterns of co-occurrence between speech cues at various levels of abstraction - speech sounds, words and phrases - serve as context to structure the systemic uncertainty of communications while gradually increasing the rate at which differences in articulations are perceived. I argue that this leads to a predictable ebb and flow of uncertainty that allows for maintenance of shared expectations about rates at which informative changes occur in speech signals. I discuss how speakers' experience affects the structure of discrete and continuous time interval distributions that segment speech sequences into phones, words and phrases.