Miller's Monkey Updated: Communicative Efficiency and the Statistics of Words in Natural Language
Spencer Caplan, Jordan Kodner, Charles Yang
June 2019
 

Is language designed for communicative and functional efficiency? G. K. Zipf famously argued that shorter words are more frequent because they are easier to use, thereby resulting in the statistical law that bears his name. Yet, G. A. Miller showed that even a monkey randomly typing at a keyboard, and intermittently striking the space bar, would generate “words” that follow the same statistical distribution. Recent quantitative analysis of human language lexicons, with special focus on the phonological and semantic ambiguities of words (Piantadosi, Tily, & Gibson, 2012), has revived Zipf’s functional hypothesis. In this study, we first report our replication effort, including the identification of a spurious result in that study which undercuts the communicative efficiency hypothesis. Second, an update to Miller’s thought experiment that incorporates the phonotactic structure of language shows that lexicons generated without recourse to functional considerations in fact exhibit the statistical properties of words attributed to communicative efficiency. Finally, the statistical distribution of the English words that emerged since 1900 shows that the attested process of lexicon formation is consistent with the updated monkey model but does not support the claim of communicative efficiency. We conclude by arguing for the need to go beyond correlational statistics and to seek direct evidence for the mechanisms that underly principles of language design. (Spencer Caplan and Jordan Kodner are co-first authors and listed alphabetically)
Format: [ pdf ]
Reference: lingbuzz/004660
(please use that when you cite this article)
Published in: submitted to Cognition
keywords: language, computational modeling, information theory, zipf's law, semantics, phonology
Downloaded:167 times

 

[ edit this article | back to article list ]