Generative Adversarial Phonology: Modeling unsupervised allophonic learning with neural networks
Gasper Begus
August 2019
 

This paper proposes a model of unsupervised phonetic and phonological learning of acoustic speech data based on Generative Adversarial Neural Networks. The Generative Adversarial architecture is uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data and learning is unsupervised without any language-specific assumptions or pre-assumed levels of abstraction. The result is a Generator network that, as the paper argues, learns conditional allophonic distributions, produces innovative outputs consistent with linguistic behavior, and learns to use latent space as an approximation to phonetic and phonological features. A Generative Adversarial Network for acoustic data proposed by Donahue et al. (2019) was trained on an allophonic distribution in English, where voiceless stops surface as aspirated word-initially before stressed vowels except if followed by a sibilant [s]. The model successfully learns the allophonic alternation: the network’s generated speech signal contains the conditional distribution of aspiration duration. Additionally, the network generates innovative outputs for which no evidence is available in the training data, suggesting that the network segments continuous speech signal into units that can be productively recombined. The paper also proposes a technique for establishing the network’s internal representations. We identify latent variables that directly correspond to presence of [s] in the output. By manipulating these variables, we actively control the presence of [s], its frication amplitude, and spectral shape of the frication noise in the generated outputs. This suggest that the network learns to use latent variables as an approximation of phonetic and phonological features, which can thus be modeled as emergent from learning in the Generative Adversarial architecture. Crucially, we observe that the dependencies learned in training extend beyond the training range, which allows for additional exploration of learning representations. The results demonstrate that Generative Adversarial Networks bear potential for modeling phonetic and phonological learning with many further applications. The paper also discusses how the model’s architecture and innovative outputs resemble and differ from linguistic behavior in language acquisition, speech disorders, and speech errors.
Format: [ pdf ]
Reference: lingbuzz/004617
(please use that when you cite this article)
Published in:
keywords: artificial intelligence, neural networks, generative adversarial networks, phonetic learning, phonological learning, voice onset time, allophonic distribution, phonology
previous versions: v2 [July 2019]
v1 [May 2019]
Downloaded:123 times

 

[ edit this article | back to article list ]