Learning nonnative speech sounds changes local encoding in the adult human cortex |
| |
Authors: | Han G. Yi Bharath Chandrasekaran Kirill V. Nourski Ariane E. Rhone William L. Schuerman Matthew A. Howard III Edward F. Chang Matthew K. Leonard |
| |
Affiliation: | aDepartment of Neurological Surgery, University of California, San Francisco, CA, 94143;bWeill Institute for Neurosciences, University of California, San Francisco, CA, 94143;cDepartment of Communication Sciences and Disorders, University of Pittsburgh, Pittsburgh, PA, 15260;dDepartment of Neurosurgery, Roy J. and Lucille A. Carver College of Medicine, The University of Iowa, Iowa City, IA, 52242-1089 |
| |
Abstract: | Adults can learn to identify nonnative speech sounds with training, albeit with substantial variability in learning behavior. Increases in behavioral accuracy are associated with increased separability for sound representations in cortical speech areas. However, it remains unclear whether individual auditory neural populations all show the same types of changes with learning, or whether there are heterogeneous encoding patterns. Here, we used high-resolution direct neural recordings to examine local population response patterns, while native English listeners learned to recognize unfamiliar vocal pitch patterns in Mandarin Chinese tones. We found a distributed set of neural populations in bilateral superior temporal gyrus and ventrolateral frontal cortex, where the encoding of Mandarin tones changed throughout training as a function of trial-by-trial accuracy (“learning effect”), including both increases and decreases in the separability of tones. These populations were distinct from populations that showed changes as a function of exposure to the stimuli regardless of trial-by-trial accuracy. These learning effects were driven in part by more variable neural responses to repeated presentations of acoustically identical stimuli. Finally, learning effects could be predicted from speech-evoked activity even before training, suggesting that intrinsic properties of these populations make them amenable to behavior-related changes. Together, these results demonstrate that nonnative speech sound learning involves a wide array of changes in neural representations across a distributed set of brain regions.Humans are finely attuned to the sounds in their native language (1, 2), driven by extensive experience hearing these sounds in many different contexts from different speakers (3–5). However, for nonnative sounds in unfamiliar languages, adult listeners often struggle to learn to recognize relatively simple contrasts (6–9). For example, although native English listeners understand how changes in vocal pitch indicate intonational prosody (e.g., statements versus questions; refs. 10 and 11), this does not translate to the ability to easily identify the syllable-level pitch patterns that define lexical tones in Mandarin Chinese (12, 13). Fundamentally, this difficulty may reflect a trade-off between maintaining stable representations of deeply engrained speech sounds and retaining enough plasticity to be able to continue to learn behaviorally relevant information throughout the lifespan (14–19). Learning to identify nonnative speech sounds often requires long and intense periods of active training (12, 20–22), consistent with the observation that speech circuits in the human brain are resistant to change following developmental critical periods (15, 17).However, even brief training periods can lead to an increased ability to identify novel speech sounds, albeit with highly variable performance across individuals (23–25). Behavioral evidence has further shown that the way listeners perceive relevant auditory cues changes after speech training (14, 25–27), which has led to the hypothesis that learning is rooted in more distinct neural representations of those sounds (17, 27). Consistent with this hypothesis, previous neuroimaging studies have shown that activation in frontotemporal areas increases following identification or discrimination tasks (13, 19, 28–30). These increases in the magnitude of activation are further associated with greater neural separability among sound categories for both speech (31–33) and nonspeech sounds (34–36). However, recent evidence suggests a highly diverse set of speech representations even within areas like the superior temporal gyrus (STG; ref. 37). Currently, the extent to which learning-related changes vary at the level of local populations remains unknown due to the broad spatial scale of noninvasive methods, which may obscure more complex dynamics. In addition, it is unclear how learning-related changes evolve on a trial-by-trial basis, as listeners initially learn to use the stimulus dimensions that allow them to achieve increased accuracy on the task, since most previous work examines neural activity only at early and late stages of the task.Here, we examined the relationship between behavioral performance during the initial stage of nonnative speech sound learning and the trial-by-trial encoding of speech content in local neural populations in the human brain. English-speaking participants listened to unfamiliar Mandarin syllables and learned to identify tone categories (22, 38), while neural activity was recorded from electrocorticography (ECoG) arrays placed over lateral cortical areas. We hypothesized that, as listeners heard the same stimuli across multiple exposures, some neural populations would show responses to Mandarin speech sounds that track the trial-by-trial fluctuations in participants’ behavioral performance during learning, and we also asked whether these changes would be uniformly reflected in increased separability among tones. We further hypothesized that these learning-related neural populations would be distinct from other potential patterns of change across trials that do not directly correlate with learning (e.g., the number of exposures to a given token, independent from accuracy) and neural populations that show stable activity patterns across trials. To address the relationship to stimulus feature encoding (e.g., pitch representations for English intonational prosody; ref. 39), we also measured the extent to which neural responses to unfamiliar Mandarin speech sounds prior to training can be used to predict the emergence of learning-related changes during training.We found a subset of local populations across the cortical surface that track trial-by-trial accuracy, even when learning performance is relatively low and variable. These learning-related effects manifest as both increases and decreases in the amplitude of neural responses to specific speech sounds and are spatially interspersed and dissociable from those that arise simply as a function of repeated exposure. Furthermore, learning-related changes are associated with higher variability of the response amplitude across repeated exposures to the same acoustic stimulus, suggesting less robust emergent neural representations. Finally, we show that intrinsic properties of these neural populations are associated with whether they show learning-related effects during training, allowing us to predict whether these effects will occur based on responses to the novel speech sounds prior to training. Together, these results demonstrate that learning to identify novel speech sounds scaffolds on existing sensitivities to relevant features and that the initial stages of learning a new language involve a specific set of processes to fine-tune local speech representations in the brain. We propose that the learning-induced increased neural separability in frontotemporal regions arises from heterogeneous changes among local populations, which comprise those regions. |
| |
Keywords: | learning speech neurophysiology perception |
|
|