Research portal

Encoding of phonology in a recurrent neural model of grounded speech

Research output: Chapter in Book/Report/Conference proceedingConference contribution

We study the representation and encoding of phonemes in a recurrent
neural network model of grounded speech. We use a model which
processes images and their spoken descriptions, and projects the
visual and auditory representations into the same semantic space. We
perform a number of analyses on how information about individual
phonemes is encoded in the MFCC features extracted from the speech
signal, and the activations of the layers of the model. Via
experiments with phoneme decoding and phoneme discrimination we show
that phoneme representations are most salient in the lower layers of
the model, where low-level signals are processed at a fine-grained
level, although a large amount of phonological information is retain at
the top recurrent layer. We further find out that the
attention mechanism following the top recurrent layer significantly
attenuates encoding of phonology and makes the utterance embeddings
much more invariant to synonymy. Moreover, a hierarchical clustering
of phoneme representations learned by the network shows an
organizational structure of phonemes similar to those proposed in
Original languageEnglish
Title of host publicationProceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017)
EditorsRoger Levy, Lucia Specia
Place of PublicationVancouver, Canada
PublisherAssociation for Computational Linguistics
Number of pages11
ISBN (Electronic)9781945626548
StatePublished - 2017
Event Conference on Computational Natural Language Learning - Vancouver, Canada
Duration: 3 Aug 20174 Aug 2017
Conference number: 21


Conference Conference on Computational Natural Language Learning




Login to Pure (for TiU staff only)