首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy and less intelligible than normal speech. This case study has investigated whether one Spanish alaryngeal speaker proficient in both oesophageal and tracheoesophageal speech modes used the same acoustic cues for prosodic boundaries in both types of voicing. Pre-boundary lengthening, F0-excursions and pausing (number of pauses and position) were measured in spontaneous speech samples, using Praat. The acoustic analysis has revealed that the subject has relied on a different combination of cues in each type of voicing to convey the presence of prosodic boundaries.  相似文献   

2.
Prosodic information aids segmentation of the continuous speech signal and thereby facilitates auditory speech processing. Durational and pitch variations are prosodic cues especially necessary to convey prosodic boundaries, but alaryngeal speakers have inconsistent control over acoustic parameters such as F0 and duration, being as a result noisy and less intelligible than normal speech. This case study has investigated whether one Spanish alaryngeal speaker proficient in both oesophageal and tracheoesophageal speech modes used the same acoustic cues for prosodic boundaries in both types of voicing. Pre-boundary lengthening, F0-excursions and pausing (number of pauses and position) were measured in spontaneous speech samples, using Praat. The acoustic analysis has revealed that the subject has relied on a different combination of cues in each type of voicing to convey the presence of prosodic boundaries.  相似文献   

3.
This paper deals with speech enhancement in noisy reverberated environments where multiple speakers are active. The authors propose an advanced real-time speech processing front-end aimed at automatically reducing the distortions introduced by room reverberation in distant speech signals, also considering the presence of background noise, and thus to achieve a significant improvement in speech quality for each speaker. The overall framework is composed of three cooperating blocks, each one fulfilling a specific task: speaker diarization, room impulse responses identification and speech dereverberation. In particular, the speaker diarization algorithm pilots the operations performed in the other two algorithmic stages, which have been suitably designed and parametrized to operate with noisy speech observations. Extensive computer simulations have been performed by using a subset of the AMI database under different realistic noisy and reverberated conditions. Obtained results show the effectiveness of the approach.  相似文献   

4.
To evaluate the right hemisphere's role in encoding speech prosody, an acoustic investigation of timing characteristics was undertaken in speakers with and without focal right-hemisphere damage (RHD) following cerebrovascular accident. Utterances varying along different prosodic dimensions (emphasis, emotion) were elicited from each speaker using a story completion paradigm, and measures of utterance rate and vowel duration were computed. Results demonstrated parallelism in how RHD and healthy individuals encoded the temporal correlates of emphasis in most experimental conditions. Differences in how RHD speakers employed temporal cues to specify some aspects of prosodic meaning (especially emotional content) were observed and corresponded to a reduction in the perceptibility of prosodic meanings when conveyed by the RHD speakers. Findings indicate that RHD individuals are most disturbed when expressing prosodic representations that vary in a graded (rather than categorical) manner in the speech signal (Blonder, Pickering, Heath et al., 1995; Pell, 1999a).  相似文献   

5.
Recognizing speech in background noise is a strenuous daily activity, yet most humans can master it. An explanation of how the human brain deals with such sensory uncertainty during speech recognition is to-date missing. Previous work has shown that recognition of speech without background noise involves modulation of the auditory thalamus (medial geniculate body; MGB): there are higher responses in left MGB for speech recognition tasks that require tracking of fast-varying stimulus properties in contrast to relatively constant stimulus properties (e.g., speaker identity tasks) despite the same stimulus input. Here, we tested the hypotheses that (1) this task-dependent modulation for speech recognition increases in parallel with the sensory uncertainty in the speech signal, i.e., the amount of background noise; and that (2) this increase is present in the ventral MGB, which corresponds to the primary sensory part of the auditory thalamus. In accordance with our hypothesis, we show, by using ultra-high-resolution functional magnetic resonance imaging (fMRI) in male and female human participants, that the task-dependent modulation of the left ventral MGB (vMGB) for speech is particularly strong when recognizing speech in noisy listening conditions in contrast to situations where the speech signal is clear. The results imply that speech in noise recognition is supported by modifications at the level of the subcortical sensory pathway providing driving input to the auditory cortex.SIGNIFICANCE STATEMENT Speech recognition in noisy environments is a challenging everyday task. One reason why humans can master this task is the recruitment of additional cognitive resources as reflected in recruitment of non-language cerebral cortex areas. Here, we show that also modulation in the primary sensory pathway is specifically involved in speech in noise recognition. We found that the left primary sensory thalamus (ventral medial geniculate body; vMGB) is more involved when recognizing speech signals as opposed to a control task (speaker identity recognition) when heard in background noise versus when the noise was absent. This finding implies that the brain optimizes sensory processing in subcortical sensory pathway structures in a task-specific manner to deal with speech recognition in noisy environments.  相似文献   

6.
This paper investigates recurrent use of the phrase very good by a speaker with non-fluent agrammatic aphasia. Informal observation of the speaker's interaction reveals that she appears to be an effective conversational partner despite very severe word retrieval difficulties that result in extensive reliance on variants of the phrase very good. The question that this paper addresses using an essentially conversation analytic framework is: What is the speaker achieving through these variants of very good and what are the linguistic and interactional resources that she draws on to achieve these communicative effects? Tokens of very good in the corpus were first analyzed in a bottom-up fashion, attending to sequential position, structure and participant orientation. This revealed distinct uses that were subsequently subjected to detailed acoustic analysis in order to investigate specific prosodic characteristics within and across the interactional variants. We identified specific clusters of prosodic cues that were exploited by the speaker to differentiate interactional uses of very good. The analysis thus shows how, in the adaptation to aphasia, the speaker exploits the rich interface between prosody, grammar and interaction both to manage the interactional demands of conversation and to communicate propositional content.  相似文献   

7.
The extraction of oriented contrast information by cortical simple cells is a fundamental step in early visual processing. The orientation selectivity originates at least partly from the input of lateral geniculate nuclei neurons with properly aligned receptive fields. In the present article, we investigate the feedforward interactions between on- and off-pathways. Based on physiological evidence we propose a push-pull model with dominating opponent inhibition (DOI). We show that the model can account for empirical data on simple cells, such as contrast-invariant orientation tuning, sharpening of orientation tuning with increasing inhibition, and strong response decrements to stimuli with luminance gradient reversal. With identical parameter settings, we apply the model for the processing of synthetic and real world images. We show that the model with DOI can robustly extract oriented contrast information from noisy input. More important, noise is adaptively suppressed, i.e. the model simple cells do not respond to homogeneous regions of different noise levels, while remaining sensitive to small contrast changes. The image processing results reveal a possible functional role of the strong inhibition as observed empirically, namely to adaptively suppress responses to noisy input.  相似文献   

8.
We have combined an echo state network (ESN) with a competitive state machine framework to create a classification engine called the predictive ESN classifier. We derive the expressions for training the predictive ESN classifier and show that the model was significantly more noise robust compared to a hidden Markov model in noisy speech classification experiments by 8+/-1 dB signal-to-noise ratio. The simple training algorithm and noise robustness of the predictive ESN classifier make it an attractive classification engine for automatic speech recognition.  相似文献   

9.
Humans’ extraordinary ability to understand speech in noise relies on multiple processes that develop with age. Using magnetoencephalography (MEG), we characterize the underlying neuromaturational basis by quantifying how cortical oscillations in 144 participants (aged 5–27 years) track phrasal and syllabic structures in connected speech mixed with different types of noise. While the extraction of prosodic cues from clear speech was stable during development, its maintenance in a multi-talker background matured rapidly up to age 9 and was associated with speech comprehension. Furthermore, while the extraction of subtler information provided by syllables matured at age 9, its maintenance in noisy backgrounds progressively matured until adulthood. Altogether, these results highlight distinct behaviorally relevant maturational trajectories for the neuronal signatures of speech perception. In accordance with grain-size proposals, neuromaturational milestones are reached increasingly late for linguistic units of decreasing size, with further delays incurred by noise.  相似文献   

10.
Hao  Kangli  Feng  Guorui  Ren  Yanli  Zhang  Xinpeng 《Cognitive computation》2020,12(6):1205-1216

In recent years, iris recognition has been widely used in various fields. As the first step of iris recognition, segmentation accuracy is of great significance to the final recognition. However, iris images exhibit a variety of noise in the real world, which leads to lower segmentation accuracy than the ideal case. To address this problem, this paper proposes an iris segmentation method using feature channel optimization for noisy images. The method for non-ideal environments with noise is more suitable for practical applications. We add dense blocks and dilated convolutional layers to the encoder so that the information gradient flow obtained by different layers can be reused, and the receptive field can be expanded. In the decoder, based on Jensen-Shannon (JS) divergence, we first recalculate the weight of the feature channels obtained from each layer, which enhances the useful information and suppresses the interference information in the noisy environments to boost the segmentation accuracy. The proposed architecture is validated in the CASIA v4.0 interval (CASIA) and IIT Delhi v1.0 datasets (IITD). For CASIA, the mean error rate is 0.78%, and the F-measure value is 98.21%. For IITD, the mean error rate is 0.97%, and the F-measure value is 97.87%. Experimental results show that the proposed method outperforms other state-of-art methods under noisy environments, such as Gaussian blur, Gaussian noise, and salt and pepper noise.

  相似文献   

11.
Sentence prosody is long known to serve both linguistic functions (e.g. to differentiate between questions and statements) and emotional functions (e.g. to detect the emotional state of a speaker). These different functions of prosodic information need to be encoded rapidly during sentence comprehension to ensure successful speech communication. However, systematic investigations of the comparative nature of these two functions, i.e. are the two functions of prosody independent or interdependent, are sparse. The question at hand is whether the two prosodic functions engage a similar neural network and run a similar time-course or not. To this aim we investigated whether emotional and linguistic prosody are processed independently or dependently in an event-related brain potential (ERP) experiment. We merged a prosodically neutral head of a sentence to a second half of a sentence that differed in emotional and/or linguistic prosody. In a within-subjects design, two tasks were administered: in the "emotion task", participants judged whether the sentence that they had just heard was spoken in a neutral tone of voice or not (emotional task); in the "linguistic task", participants decided whether the sentence was a declarative sentence or not. As predicted, the previously reported prosodic expectancy positivity (PEP) was elicited by linguistic and emotional prosodic expectancy violations. However, the latency and distribution of the ERP component differed: whilst responses to emotional prosodic expectancy violations were elicited shortly after an expectancy violation (~470 ms post splicing-point) and most prominently at posterior electrode-sites, the positivity in response to linguistic prosody had a later onset (~620 ms post splicing-point) with a more frontal distribution. Interestingly, responses to combined (linguistic and emotional) expectancy violations resulted in a broadly distributed positivity with an onset of ~170 ms post expectancy violation. These effects were found irrespective of the task setting. Given the differences in latency and distribution, we conclude that the processing of emotional and linguistic prosody relies at least partly on differing neural mechanisms and that emotional prosodic aspects of language are processed in a prioritized processing stream.  相似文献   

12.
Deficits in emotional prosodic processing, the expression of emotions in voice, have been widely reported in patients with schizophrenia, not only in comprehending emotional prosody but also expressing it. Given that prosodic cues are important in memory for voice and speaker identity, Cutting has proposed that prosodic deficits may contribute to the misattribution that appears to occur in auditory hallucinations in psychosis. The present study compared hallucinating patients with schizophrenia, non-hallucinating patients and normal controls on an emotional prosodic processing task. It was hypothesised that hallucinators would demonstrate greater deficits in emotional prosodic processing than non-hallucinators and normal controls. Participants were 67 patients with a diagnosis of schizophrenia or schizoaffective disorder (hallucinating = 38, non-hallucinating = 29) and 31 normal controls. The prosodic processing task used in this study comprised a series of semantically neutral sentences expressed in happy, sad and neutral voices which were rated on a 7-point Likert scale from sad (− 3) through neutral (0) to happy (+ 3). Significant deficits in the prosodic processing tasks were found in hallucinating patients compared to non-hallucinating patients and normal controls. No significant differences were observed between non-hallucinating patients and normal controls. In the present study, patients experiencing auditory hallucinations were not as successful in recognising and using prosodic cues as the non-hallucinating patients. These results are consistent with Cutting's hypothesis, that prosodic dysfunction may mediate the misattribution of auditory hallucinations.  相似文献   

13.
From birth, humans constantly make decisions about what to look at and for how long. Yet, the mechanism behind such decision-making remains poorly understood. Here, we present the rational action, noisy choice for habituation (RANCH) model. RANCH is a rational learning model that takes noisy perceptual samples from stimuli and makes sampling decisions based on expected information gain (EIG). The model captures key patterns of looking time documented in developmental research: habituation and dishabituation. We evaluated the model with adult looking time collected from a paradigm analogous to the infant habituation paradigm. We compared RANCH with baseline models (no learning model, no perceptual noise model) and models with alternative linking hypotheses (Surprisal, KL divergence). We showed that (1) learning and perceptual noise are critical assumptions of the model, and (2) Surprisal and KL are good proxies for EIG under the current learning context.  相似文献   

14.
Emotional prosody provides important cues for understanding the emotions of others in every day communication. Asperger's syndrome (AS) is a developmental disorder characterised by pronounced deficits in socio-emotional communication, including difficulties in the domain of prosody processing. We measured pupillary responses as an index of emotional prosodic processing when 15 participants with AS and 19 non-clinical control participants listened to positive, negative and neutral prosodic sentences. This occurred under a spontaneous and an explicit task instruction. In the explicit processing condition, the AS group and the non-clinical controls showed increased pupil dilations to positively and negatively intoned sentences when judging the valence of that prosodic sentence. This suggests higher processing demands for emotionally arousing information, as the effect was not found in comparison to neutrally intoned sentences. In the spontaneous processing condition, controls also responded with increased pupil dilations to positively intoned sentences, whilst individuals with AS showed increased pupil dilations to negative sentences. The latter result is further supported by diminished ratings of emotionally intense sentences in the AS group compared to healthy controls. Perception and recognition of positively valenced sentences in individuals with AS appears impaired and dependent on the general task set-up. Diminished pupil dilations in spontaneous positive processing conditions as well as reduced positive valence ratings give strong indications for a general negative processing bias of verbal information for adult individuals diagnosed with AS.  相似文献   

15.
In recent years, the established link between the various human communication production domains has become more widely utilised in the field of speech processing. In this work, we build on previous work by the authors and present a novel two-stage audiovisual speech enhancement system, making use of audio-only beamforming, automatic lip tracking, and pre-processing with visually derived Wiener speech filtering. Initial results have demonstrated that this two-stage multimodal speech enhancement approach can produce positive results with noisy speech mixtures that conventional audio-only beamforming would struggle to cope with, such as in very noisy environments with a very low signal to noise ratio, and when the type of noise is difficult for audio-only beamforming to process.  相似文献   

16.
In order to avoid overfitting, we propose error correcting memorization learning. This method is derived from minimization of error between outputs of a trained neural network and correct values for noisy training examples, although the correct values are unknown. We show that noise is adequately suppressed by error correcting memorization learning. The noise suppression mechanism is theoretically clarified. It is found that redundancy plays an essential role for noise suppression and depends on a set of training inputs. We give the condition for the training inputs to provide the redundancy. Moreover, by clarifying the relationships between the proposed method and the weighted least squares estimation with the Mahalanobis norm, we reveal effectiveness of the weighted least squares estimation on noise suppression.  相似文献   

17.
Familiarity is thought to aid listeners in decoding disordered speech; however, as the speech signal degrades, the “familiarity advantage” becomes less beneficial. Despite highly unintelligible speech sound production, many children with dysarthria vocalize when interacting with familiar caregivers. Perhaps listeners can understand these vocalizations by cuing into prosodic consistencies in their child's productions. This paper examined whether familiarity influenced the identification of sustained vowels that varied in pitch, duration, and pitch–duration combinations, produced by 3 children with severe dysarthria due to cerebral palsy. Thirty‐six listeners participated in the study. For each speaker, there were 2 familiar listeners (FAM), 5 experienced listeners (EXP), and 5 unfamiliar/inexperienced listeners (INX). Results indicated that familiarity did not impact identification of prosodic contrasts. In fact, all 3 listener groups were highly accurate in identifying duration, somewhat less successful at identifying pitch, and least accurate in identifying combinations of pitch and duration. Influences of speaker–listener variables on familiarity are discussed.  相似文献   

18.
Sex differentiates the role of emotional prosody during word processing   总被引:4,自引:0,他引:4  
The meaning of a speech stream is communicated by more than the particular words used by the speaker. For example, speech melody, referred to as prosody, also contributes to meaning. In a cross-modal priming study we investigated the influence of emotional prosody on the processing of visually presented positive and negative target words. The results indicate that emotional prosody modulates word processing and that the time-course of this modulation differs for males and females. Women show behavioural and electrophysiological priming effects already with a small interval between the prosodic prime and the visual target word. In men, however, similar effects of emotional prosody on word processing occur only for a longer interval between prime and target. This indicates that women make an earlier use of emotional prosody during word processing as compared to men.  相似文献   

19.
The everyday communication of children is commonly observed by their parents. This paper examines the responses of parents (n?=?18) who had both a Cochlear Implant (CI) and a Normal Hearing (NH) child. Through an online questionnaire, parents rated the ability of their children on a gamut of speech communication competencies encountered in everyday settings. Comparative parental ratings of the CI children were significantly poorer than those of their NH siblings in speaker recognition, happy and sad emotion, and question versus statement identification. Parents also reported that they changed the vocal effort and the enunciation of their speech when they addressed their CI child and that their CI child consistently responded when their name was called in normal, but not in noisy backgrounds. Demographic factors were not found to be linked to the parental impressions.  相似文献   

20.
While Parkinson's disease (PD) has traditionally been described as a movement disorder, there is growing evidence of cognitive and social deficits associated with the disease. However, few studies have looked at multi-modal social cognitive deficits in patients with PD. We studied lateralization of both prosodic and facial emotion recognition (the ability to recognize emotional valence from either tone of voice or from facial expressions) in PD. The Comprehensive Affect Testing System (CATS) is a well-validated test of human emotion processing that has been used to study emotion recognition in several major clinical populations, but never before in PD. We administered an abbreviated version of CATS (CATS-A) to 24 medicated PD participants and 12 age-matched controls. PD participants were divided into two groups, based on side of symptom onset and unilateral motor symptom severity: left-affected (N = 12) or right-affected PD participants (N = 12). CATS-A is a computer-based button press task with eight subtests relevant to prosodic and facial emotion recognition. Left-affected PD participants with inferred predominant right-hemisphere pathology were expected to have difficulty with prosodic emotion recognition since there is evidence that the processing of prosodic information is right-hemisphere dominant. We found that facial emotion recognition was preserved in the PD group, however, left-affected PD participants had specific impairment in prosodic emotion recognition, especially for sadness. Selective deficits in prosodic emotion recognition suggests that (1) hemispheric effects in emotion recognition may contribute to the impairment of emotional communication in a subset of people with PD and (2) the coordination of neural networks needed to decipher temporally complex social cues may be specifically disrupted in PD.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号