The perception of sound textures, a class of natural sounds defined by statistical sound structure such as fire, wind, and rain, has been proposed to arise through the integration of time-averaged summary statistics. Where and how the auditory system might encode these summary statistics to create internal representations of these stationary sounds, however, is unknown. Here, using natural textures and synthetic variants with reduced statistics, we show that summary statistics modulate the correlations between frequency organized neuron ensembles in the awake rabbit inferior colliculus (IC). These neural ensemble correlation statistics capture high-order sound structure and allow for accurate neural decoding in a single trial recognition task with evidence accumulation times approaching 1 s. In contrast, the average activity across the neural ensemble (neural spectrum) provides a fast (tens of milliseconds) and salient signal that contributes primarily to texture discrimination. Intriguingly, perceptual studies in human listeners reveal analogous trends: the sound spectrum is integrated quickly and serves as a salient discrimination cue while high-order sound statistics are integrated slowly and contribute substantially more toward recognition. The findings suggest statistical sound cues such as the sound spectrum and correlation structure are represented by distinct response statistics in auditory midbrain ensembles, and that these neural response statistics may have dissociable roles and time scales for the recognition and discrimination of natural sounds.What makes a sound natural, and what are the neural codes that support recognition and discrimination of real-world natural sounds? Although it is known that the early auditory system decomposes sounds along fundamental acoustic dimensions such as intensity and frequency, the higher-level neural computations that mediate natural sound recognition are poorly understood. This general lack of understanding is in part attributed to the structural complexity of natural sounds, which is difficult to study with traditional auditory test stimuli, such as tones, noise, or modulated sequences. Such stimuli can reveal details of the neural representation for relatively low-level acoustic cues, yet they don’t capture the rich and diverse statistical structure of natural sounds. Thus, they cannot reveal many of the computations associated with higher-level sound properties that facilitate auditory tasks such as natural sound recognition or discrimination. A class of stationary natural sounds termed textures, such as the random sounds emanating from a running stream, a crowded restaurant, or a chorus of birds, have been proposed as alternative natural stimuli which allow for manipulating high-level acoustic structure (
1). Texture sounds are composed of spatially and temporally distributed acoustic elements that are collectively perceived as a single source and are defined by their statistical features. Identification of these natural sounds has been proposed to be mediated through the integration of time-averaged summary statistics, which account for high-level structures such as the sparsity and time-frequency correlation structure found in many natural sounds (
1–
3). Using a generative model of the auditory system to measure summary statistics from natural texture sounds, it is possible to synthesize highly realistic synthetic auditory textures (
1). This suggests that high-order statistical cues are perceptually salient and that the brain might extract these statistical features to build internal representations of sounds.Although neural activity throughout the auditory pathway is sensitive to a variety of statistical cues such as the sound contrast, modulation power spectrum, and correlation structure (
4–
12), how sound summary statistics contribute toward basic auditory tasks such as recognition and discrimination of sounds is poorly understood. Furthermore, it is unclear where along the auditory pathway summary statistics are represented and how they are reflected in neural activity. The inferior colliculus (IC) is one candidate midlevel structure for representing such summary statistics. As the principal midbrain auditory nucleus, the IC receives highly convergent brainstem inputs with varied sound selectivities. Neurons in the IC are selective over most of the perceptually relevant range of sound modulations and neural activity is strongly driven by multiple high-order sound statistics (
4–
7,
10). In previous work, we showed the correlation statistics of natural sounds are highly informative about stimulus identity and they appear to be represented in the correlation statistics of auditory midbrain neuron ensembles (
4). Correlations between neurons have also been proposed as mechanisms for pitch identification (
13) and sound localization (
14). This broadly supports the hypotheses that high-order sound statistics are reflected in the response statistics of neural ensembles and that these neural response statistics could potentially subserve basic auditory tasks.Here using natural and synthetic texture sounds, we test the hypothesis that statistical structure in natural texture sounds modulates the response statistics of neural ensembles in the IC of unanesthetized rabbits, and that distinct neural response statistics have the potential to contribute toward sound recognition and discrimination behaviors. By comparing the performance of neural decoders with human texture perception, we find that place rate representation of sounds (neural spectrum) accumulates evidence about the sounds on relatively fast time scales (tens of milliseconds) exhibiting decoding trends that mirror those seen for human texture discrimination. High-order statistical sound cues, by comparison, are reflected in the correlation statistics of neural ensembles, which require substantially longer evidence accumulation times (>500 ms) and follow trends that mirror those measured for human texture recognition. Collectively, the findings suggest that spectrum cues and accompanying place rate representation (neural spectrum) may contribute surprisingly little toward the recognition of auditory textures. Instead, high-order statistical sound structure is reflected in the distributed patterns of correlated activity across IC neural ensembles and such neural response structure has the potential to contribute toward the recognition of natural auditory textures.
相似文献