首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到17条相似文献,搜索用时 171 毫秒
1.
为了帮助聋哑人士早日进行语言康复,本文以汉语声母为例设计了一个利用自动语音识别技术进行发音训练的计算机辅助语音教学系统.首先由标准语音教师进行声母录音以建立语料库,同时提取关键发音口型视频.然后提取语料的动静态特征作为训练和识别的依据.利用连续密度隐马尔可夫模型(hidden Markov model)建模,并应用嵌入式算法进行训练.最后利用信令传递算法和识别网络进行识别.教师汉语声母集外识别率为96.65%,这意味着该系统可以作为汉语声母发音学习的基准系统.借助视觉反馈的发音改进对比实验证明该语音学习系统可以有效帮助聋哑人士进行语音学习.  相似文献   

2.
一种基于汉语音调信息的电子耳蜗语音信号处理新方案   总被引:2,自引:0,他引:2  
本文在连续交替取样(Continuous Inteleaved Sampling,CIS)电子耳蜗语音处理方案的基础上,针对汉语语音信号的特点,提出一种基于汉语音调信息的语音信号处理新方案。文中首先讨论了汉语语音的特点,并初步讨论了音调的变化对语音信号处理效果的影响。结果表明,将汉语的音调变化信息加入到CIS语音信号处理方案中,可明显地提高汉语的识别能力,基于本文的结果,可以设计出适合中国聋人的电子耳蜗。  相似文献   

3.
针对构音障碍语音识别率难以提升的问题,本文提出一种多尺度梅尔域特征图谱提取算法。该算法采用经验模态分解方法分解语音信号,分别对三个有效分量提取Fbank特征及其一阶差分,从而构成能够捕捉频域细节信息的新特征图谱。其次,由于单路神经网络在训练过程中,存在有效特征丢失及计算复杂度高的问题,本文提出一种语音识别网络模型。最后,在公开UA-Speech数据集上进行训练和解码。实验结果表明,本文方法的语音识别模型准确率达到了92.77%,因此,本文所提算法能有效提高构音障碍语音识别率。  相似文献   

4.
本文介绍了一种基于语音识别技术的残疾人环境控制系统(ECU),该系统用PC机采集和识别输入语音,通过并口将指令传到控制箱中,控制箱内主要是译码电路、继电器组和电视红外遥控电路,分别控制电器设备、电话机和电视机的各项功能。该系统采用小词江表孤立词语音识别模式,在语音指令的识别中用短时能频值作为切分音节的依据。用LPC倒谱和差分倒谱作为特征矢量,用倒谱域非线性分块法对特征矢量进行压缩和时间对准,最后用  相似文献   

5.
光学相干断层影像(OCT)是一种应用于血管的影像新技术,其高分辨率和可量化分析等特点,使其能够检测血管内膜和斑块表面的特殊结构并发现微小病变。特别是随着其在识别冠状动脉粥样硬化斑块、优化经皮冠状动脉介入(PCI)治疗、辅助医生制定相关诊断和治疗策略以及支架术后评估等方面的应用相继展开,OCT已经成为心血管疾病诊断的有效工具。本文提出了一种基于先验边界条件的冠脉OCT内膜轮廓序列提取算法,在Chan-Vese模型基础上通过改进演化权函数把轮廓曲线的局部信息引入模型,控制曲线边界演化速度,并在模型中加入梯度能量项和基于先验边界条件的内膜轮廓形状限制项,进一步约束曲线演化轮廓的形状,最终实现冠脉血管内膜轮廓的序列提取。与作为金标准的专业医生手动分割结果进行实验对比,结果表明本算法在冠脉OCT内膜轮廓模糊、失真、有导丝阴影及有斑块干扰等情况下均能准确提取冠脉血管内膜轮廓,提示本研究成果或可应用于临床辅助诊断和精确诊疗之中。  相似文献   

6.
提出一种基于低频带非均匀采样的电子耳蜗编码策略,即低频带精细结构(LFFS)过零刺激方案(简称LFFS方案),以提高电子耳蜗汉语声调和语音识别鲁棒性。根据频带选择法则,在人耳基频感知范围内,采用精细结构过零刺激脉冲序列。声学模拟结果表明:在安静环境下,LFFS方案和连续交替采样(CIS)方案语音识别率差别不大;在噪声环境下,LFFS方案在汉语声调、词汇和句子方面要明显优于CIS方案,同时采用改进指数分布模型得到较好的汉语识别因素分布图。LFFS方案包含了更多的汉语声调信息,所以能有效地提高电子耳蜗植入患者汉语识别鲁棒性。  相似文献   

7.
本文提出了一种基于图象特征的动态轮廓模型。该模型利用人工间接初始化轮廓的几点信息后,再利用图象自身的一些特征,能自动地、高效地、准确地识别所需的轮廓。而且,即使遇到传统方法很难处理的复杂图象时,也能给出满意的结果。与其它模型相比,它在保留了灵活性大、重复性高等优点的同时,操作更为简便。  相似文献   

8.
目的:提取医学图像中肿瘤区域,用以测量肿瘤体积问题。方法:提出一种基于GACV(Geodesic-Aided C-Vmethod)的交互式模型。该模型首先人工选取感兴趣区域,并在区域内设定初始水平集与肿瘤内部种子点,然后在感兴趣区域上应用将图像梯度边缘信息与图像区域灰度特性统一到同一分割中的GACV模型,得到肿瘤的粗分割结果。最后为去除目标内外孔洞,提出一种无损边缘的膨胀搜索算法,作为细分割。结果:将该模型应用于不同形状的肿瘤图像中,能成功检测肿瘤轮廓。通过实验与其它活动轮廓分割方法结果对比,结果显示该模型在准确分割肿瘤边界与分割算法耗时方面均具有良好表现。结论:本文提出的分割方法能高效率、准确识别肿瘤区域。  相似文献   

9.
目的为了辅助医生诊断精神分裂症患者,设计一种高精度桌面式眼动仪。方法系统采用双屏输出,摄像机集成了红外光源及红外滤光镜片。重点提出一种新的瞳孔中心识别算法,首先对预处理后的眼图进行瞳孔轮廓提取,然后采用最小二乘椭圆拟合法识别瞳孔中心。在此基础上,对瞳孔周围区域进行边缘检测,得到的最大轮廓中心即为普尔钦光斑中心。最后基于非线性二阶多项式拟合静态标定算法建立了眼图坐标系和场景坐标系的映射方程,并完成了注视点标定。结果实现了眼动轨迹的实时显示和凝视点数目的统计,测试有效率达到80%以上。结论系统能够很好地跟踪和显示人眼运动,具有较好的实用性。  相似文献   

10.
医学图像的病灶边缘一般呈弱边缘特性,噪声干扰使得提取病灶边缘更加困难,传统的分割方法不能取得令人满意的效果.我们提出了一种基于二进小波变换和主动轮廓模型的病灶边缘提取方法.该方法采用二进小波检测出真正的边缘点,将其作为初始轮廓,再利用改进的快速主动轮廓模型算法连接边缘点,得到病灶的边缘.将该算法用于脑部MRI的肿瘤边缘提取的实验结果表明这种方法可以有效减少噪声的影响,能够准确地提取出复杂的病灶边缘.  相似文献   

11.
基于BP神经网络的耳穴信息智能识别系统   总被引:1,自引:0,他引:1  
本文介绍用BP网络建立耳穴信息的智能识别系统,其设计原理和方法融合了经络、脏腑理论与模式识别理论,并用该系统通过识别人体耳穴电学特征量筛检上消化道癌。本文中用该模型作胃病例样本识别试验获得了较好的结果,此项工作对于癌症防治有重要意义,同时也为现代科学与传统中医学的结合开辟了一条新路。  相似文献   

12.
There are obvious differences between recognizing faces and recognizing spoken words or phonemes that might suggest development of each capability requires different skills. Recognizing faces and perceiving spoken language, however, are in key senses extremely similar endeavors. Both perceptual processes are based on richly variable, yet highly structured input from which the perceiver needs to extract categorically meaningful information. This similarity could be reflected in the perceptual narrowing that occurs within the first year of life in both domains. We take the position that the perceptual and neurocognitive processes by which face and speech recognition develop are based on a set of common principles. One common principle is the importance of systematic variability in the input as a source of information rather than noise. Experience of this variability leads to perceptual tuning to the critical properties that define individual faces or spoken words versus their membership in larger groupings of people and their language communities. We argue that parallels can be drawn directly between the principles responsible for the development of face and spoken language perception. © 2014 The Authors. Dev Psychobiol Published by Wiley Periodicals, Inc. Dev Psychobiol 56: 1454–1481, 2014.  相似文献   

13.
Summary Upper lip, lower lip, and jaw kinematics during select speech behaviors were studied in an attempt to identify potential invariant characteristics associated with this highly skilled motor behavior. Data indicated that speech motor actions are executed and planned presumably in terms of relatively invariant combined multimovement gestures. In contrast, the individual upper lip, lower lip, and jaw movements and their moment-to-moment coordination were executed in a variable manner, demonstrating substantial motor equivalence. Based on the trial-to-trial variability in the movement amplitudes, absolute positions, and velocities of the upper lip, lower lip, and jaw, it appears that speech motor planning is not formulated in terms of spatial coordinates. Seemingly, object-level planning for speech may be encoded in relation to the acoustic consequences of the movements and ultimately with regard to listener's auditory perceptions. In addition, certain temporal parameters among the three movements (relative times of movement onsets and velocity peaks) were related stereotypically, reflecting invariances characteristic of more automatic motor behaviors such as chewing and locomotion. These data thus appear to provide some additional insights into the hierarchy of multimovement control. At the top of the motor control hierarchy, the overall plan appears to be generated with explicit specification of certain temporal parameters. Subsequently, based upon the plan and within that stereotypic temporal framework, covariable adjustments among the individual movements are implemented. Given the results of previous perturbation studies, it is hypothesized that these covariable velocity and amplitude adjustments reflect the action of sensorimptor mechanisms.  相似文献   

14.
Science of human identification using physiological characteristics or biometry has been of great concern in security systems. However, robust multimodal identification systems based on audio-visual information has not been thoroughly investigated yet. Therefore, the aim of this work to propose a model-based feature extraction method which employs physiological characteristics of facial muscles producing lip movements. This approach adopts the intrinsic properties of muscles such as viscosity, elasticity, and mass which are extracted from the dynamic lip model. These parameters are exclusively dependent on the neuro-muscular properties of speaker; consequently, imitation of valid speakers could be reduced to a large extent. These parameters are applied to a hidden Markov model (HMM) audio-visual identification system. In this work, a combination of audio and video features has been employed by adopting a multistream pseudo-synchronized HMM training method. Noise robust audio features such as Mel-frequency cepstral coefficients (MFCC), spectral subtraction (SS), and relative spectra perceptual linear prediction (J-RASTA-PLP) have been used to evaluate the performance of the multimodal system once efficient audio feature extraction methods have been utilized. The superior performance of the proposed system is demonstrated on a large multispeaker database of continuously spoken digits, along with a sentence that is phonetically rich. To evaluate the robustness of algorithms, some experiments were performed on genetically identical twins. Furthermore, changes in speaker voice were simulated with drug inhalation tests. In 3 dB signal to noise ratio (SNR), the dynamic muscle model improved the identification rate of the audio-visual system from 91 to 98%. Results on identical twins revealed that there was an apparent improvement on the performance for the dynamic muscle model-based system, in which the identification rate of the audio-visual system was enhanced from 87 to 96%.  相似文献   

15.
Visual speech (lip-reading) influences the perception of heard speech. The literature suggests at least two possible mechanisms for this influence: “direct” sensory–sensory interaction, whereby sensory signals from auditory and visual modalities are integrated directly, likely in the superior temporal sulcus, and “indirect” sensory–motor interaction, whereby visual speech is first mapped onto motor-speech representations in the frontal lobe, which in turn influences sensory perception via sensory–motor integration networks. We hypothesize that both mechanisms exist, and further that previous demonstrations of lip-reading functional activations in Broca's region and the posterior planum temporale reflect the sensory–motor mechanism. We tested one prediction of this hypothesis using fMRI. We assessed whether viewing visual speech (contrasted with facial gestures) activates the same network as a speech sensory–motor integration task (listen to and then silently rehearse speech). Both tasks activated locations within Broca's area, dorsal premotor cortex, and the posterior planum temporal (Spt), and focal regions of the STS, all of which have previously been implicated in sensory–motor integration for speech. This finding is consistent with the view that visual speech influences heard speech via sensory–motor networks. Lip-reading also activated a much wider network in the superior temporal lobe than the sensory–motor task, possibly reflecting a more direct cross-sensory integration network.  相似文献   

16.
Speaking involves the activity of multiple muscles moving many parts (articulators) of the vocal tract. In previous studies, it has been shown that mechanical perturbation delivered to one moving speech articulator, such as the lower lip or jaw, results in compensatory responses in the perturbed and other non-perturbed articulators, but not in articulators that are uninvolved in the specific speech sound being produced. These observations suggest that the speech motor control system may be organized in a task-specific manner. However, previous studies have not used the appropriate controls to address the mechanism by which this task-specific organization is achieved. A lack of response in a non-perturbed articulator may simply reflect the fact that the muscles examined were not active. Alternatively, there may be a specific gating of somatic sensory signals due to task requirements. The present study was designed to address the nature of the underlying sensorimotor organization. Unanticipated mechanical loads were applied to the upper lip during the "p" in "apa" and "f" in "afa" in six subjects. Both lips are used to produce "p", while only the lower lip is used for "f". For "apa", both upper lip and lower lip responses were observed following upper lip perturbation. For "afa", no upper lip or lower lip responses were observed following the upper lip perturbation. The differential response of the lower lip, which was phasically active during both speech tasks, indicates that the neural organization of these two speech tasks differs not only in terms of the different muscles used to produce the different movements, but also in terms of the sensorimotor interactions within and across the two lips.  相似文献   

17.
This paper introduces a three-dimensional (3D) reconstruction algorithm of the brain stem nuclei based on fast centroid auto-registration. The research is based on methods and theories of computer stereo vision, and by image information processing three-point pattern local search, registration and auto-tracing for the centroids of the brain stem nuclei were accomplished. We adopt two-peak threshold, edge detection and grayscale image enhancement to extract contours of the nuclei's structures. The experimental results obtain the spatial structure information and 3D image of the brain stem nuclei, show spatial relationship between 14 pairs of nuclei, and quantitate morphological parameters of each type of nuclei's 3D structure. This work is significant to neuroanatomy research and clinic applications. Furthermore, a software system named BRAIN.HUK is established.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号