Estimating child linguistic experience from historical corpora
Jordan Kodner
August 2019

Child language acquisition is often identified as one of the primary drivers of language change, but the lack of historical child data presents a challenge for empirically investigating its effect. In this work, I observe the relationship between lexicons extracted from modern child-directed speech and those drawn from modern and historical literary corpora in order to better understand when language acquisition can be modeled over historical and non-child corpora as it is over child corpora. The type frequencies of morphophonological and syntactic-semantic patterns occur at similar type frequencies in these corpora among high token frequency items, and furthermore, when a learning algorithm is applied to lexicons sampled from these sources, it consistently achieves the same learning outcomes in each. With appropriate care and pre-processing, modern and historical text corpora are effectively interchangeable with child-directed speech corpora for the purpose of estimating child lexical experience, opening a path for modeling language acquisition where child-directed corpora are not available.
Format: [ pdf ]
Reference: lingbuzz/004740
(please use that when you cite this article)
Published in: accepted to Glossa
keywords: child language acquisition, corpus linguistics, historical linguistics, english, latin, proto-germanic, spanish, icelandic, morphology
Downloaded:92 times


[ edit this article | back to article list ]