Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure

Nai Ding, Monita Chatterjee, Jonathan Z. Simon

Research output: Contribution to journalArticlepeer-review

157 Scopus citations


Speech recognition is robust to background noise. One underlying neural mechanism is that the auditory system segregates speech from the listening background and encodes it reliably. Such robust internal representation has been demonstrated in auditory cortex by neural activity entrained to the temporal envelope of speech. A paradox, however, then arises, as the spectro-temporal fine structure rather than the temporal envelope is known to be the major cue to segregate target speech from background noise. Does the reliable cortical entrainment in fact reflect a robust internal "synthesis" of the attended speech stream rather than direct tracking of the acoustic envelope? Here, we test this hypothesis by degrading the spectro-temporal fine structure while preserving the temporal envelope using vocoders. Magnetoencephalography (MEG) recordings reveal that cortical entrainment to vocoded speech is severely degraded by background noise, in contrast to the robust entrainment to natural speech. Furthermore, cortical entrainment in the delta-band (1-4. Hz) predicts the speech recognition score at the level of individual listeners. These results demonstrate that reliable cortical entrainment to speech relies on the spectro-temporal fine structure, and suggest that cortical entrainment to the speech envelope is not merely a representation of the speech envelope but a coherent representation of multiscale spectro-temporal features that are synchronized to the syllabic and phrasal rhythms of speech.

Original languageEnglish (US)
Pages (from-to)41-46
Number of pages6
StatePublished - Mar 2014


  • Auditory cortex
  • Auditory scene analysis
  • Envelope entrainment
  • MEG

ASJC Scopus subject areas

  • Neurology
  • Cognitive Neuroscience


Dive into the research topics of 'Robust cortical entrainment to the speech envelope relies on the spectro-temporal fine structure'. Together they form a unique fingerprint.

Cite this