Generative Adversarial Phonology: Modeling unsupervised allophonic learning with neural networks
Gasper Begus
October 2019

This paper argues that phonetic and phonological learning can be modeled as a dependency between random space and generated speech data in the Generative Adversarial Network architecture and proposes a methodology to uncover the network’s internal representations that correspond to phonetic and phonological features. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. A Generative Adversarial Network (Goodfellow et al. 2014, implemented for acoustic data as WaveGAN in Donahue et al. 2019) was trained on an allophonic distribution in English, in which voiceless stops surface as aspirated word-initially before stressed vowels, except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network’s generated speech signal contains the conditional distribution of aspiration duration. Additionally, the network generates innovative outputs for which no evidence is available in the training data, suggesting that the network segments continuous speech signal into units that can be productively recombined. The paper proposes a technique for establishing the network’s internal representations that identifies latent variables that correspond to, for example, presence of [s] and its spectral properties. By manipulating these variables, we actively control the presence of [s] and its frication amplitude in the generated outputs. This suggests that the network learns to use latent variables as an approximation of phonetic and phonological features. Crucially, we observe that the dependencies learned in training extend beyond the training interval, which allows for additional exploration of learning representations. The paper also discusses how the network’s architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors.
Format: [ pdf ]
Reference: lingbuzz/004617
(please use that when you cite this article)
Published in: Submitted.
keywords: artificial intelligence, neural networks, generative adversarial networks, phonetic learning, phonological learning, voice onset time, allophonic distribution, phonology
previous versions: v5 [October 2019]
v4 [October 2019]
v3 [August 2019]
v2 [July 2019]
v1 [May 2019]
Downloaded:306 times


[ edit this article | back to article list ]