首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 34 毫秒
1.
The aims of this work were to measure the accuracy of one continuous speech recognition product and dependence on the speaker's gender and status as a native or nonnative English speaker, and evaluate the product's potential for routine use in transcribing radiology reports. IBM MedSpeak/Radiology software, version 1.1 was evaluated by 6 speakers. Two were nonnative English speakers, and 3 were men. Each speaker dictated a set of 12 reports. The reports included neurologic and body imaging examinations performed with 6 different modalities. The dictated and original report texts were compared, and error rates for overall, significant, and subtle significant errors were computed. Error rate dependence on modality, native English speaker status, and gender were evaluated by performing ttests. The overall error rate was 10.3 +/- 3.3%. No difference in accuracy between men and women was found; however, significant differences were seen for overall and significant errors when comparing native and nonnative English speakers (P = .009 and P = .008, respectively). The speech recognition software is approximately 90% accurate, and while practical implementation issues (rather than accuracy) currently limit routine use of this product throughout a radiology practice, application in niche areas such as the emergency room currently is being pursued. This methodology provides a convenient way to compare the initial accuracy of different speech recognition products, and changes in accuracy over time, in a detailed and sensitive manner.  相似文献   

2.
Current speech recognition software allows exam-specific standard reports to be prepopulated into the dictation field based on the radiology information system procedure code. While it is thought that prepopulating reports can decrease the time required to dictate a study and the overall number of errors in the final report, this hypothesis has not been studied in a clinical setting. A prospective study was performed. During the first week, radiologists dictated all studies using prepopulated standard reports. During the second week, all studies were dictated after prepopulated reports had been disabled. Final radiology reports were evaluated for 11 different types of errors. Each error within a report was classified individually. The median time required to dictate an exam was compared between the 2 weeks. There were 12,387 reports dictated during the study, of which, 1,173 randomly distributed reports were analyzed for errors. There was no difference in the number of errors per report between the 2 weeks; however, radiologists overwhelmingly preferred using a standard report both weeks. Grammatical errors were by far the most common error type, followed by missense errors and errors of omission. There was no significant difference in the median dictation time when comparing studies performed each week. The use of prepopulated reports does not alone affect the error rate or dictation time of radiology reports. While it is a useful feature for radiologists, it must be coupled with other strategies in order to decrease errors.  相似文献   

3.
When listening to a speaker, we need to adapt to her individual speaking characteristics, such as error proneness, accent, etc. The present study investigated two aspects of adaptation to speaker identity during processing spoken sentences in multi-speaker situations: the effect of speaker sequence across sentences and the effect of learning speaker-specific error probability. Spoken sentences were presented, cued, and accompanied by one of three portraits that were labeled as the speakers’ faces. In Block 1 speaker-specific probabilities of syntax errors were 10%, 50%, or 90%; in Block 2 they were uniformly 50%. In both blocks, speech errors elicited P600 effects in the scalp recorded ERP. We found a speaker sequence effect only in Block 1: the P600 to target words was larger after speaker switches than after speaker repetitions, independent of sentence correctness. In Block 1, listeners showed higher accuracy in judging sentence correctness spoken by speakers with lower error proportions. No speaker-specific differences in target word P600 and accuracy were found in Block 2. When speakers differ in error proneness, listeners seem to flexibly adapt their speech processing for the upcoming sentence through attention reorientation and resource reallocation if the speaker is about to change, and through proactive maintenance of neural resources if the speaker remains the same.  相似文献   

4.
The aim of this study was to analyze retrospectively the influence of different acoustic and language models in order to determine the most important effects to the clinical performance of an Estonian language-based non-commercial radiology-oriented automatic speech recognition (ASR) system. An ASR system was developed for Estonian language in radiology domain by utilizing open-source software components (Kaldi toolkit, Thrax). The ASR system was trained with the real radiology text reports and dictations collected during development phases. The final version of the ASR system was tested by 11 radiologists who dictated 219 reports in total, in spontaneous manner in a real clinical environment. The audio files collected in the final phase were used to measure the performance of different versions of the ASR system retrospectively. ASR system versions were evaluated by word error rate (WER) for each speaker and modality and by WER difference for the first and the last version of the ASR system. Total average WER for the final version throughout all material was improved from 18.4% of the first version (v1) to 5.8% of the last (v8) version which corresponds to relative improvement of 68.5%. WER improvement was strongly related to modality and radiologist. In summary, the performance of the final ASR system version was close to optimal, delivering similar results to all modalities and being independent on user, the complexity of the radiology reports, user experience, and speech characteristics.  相似文献   

5.
6.
The purpose of this study was to evaluate and compare textual error rates and subtypes in radiology reports before and after implementation of department-wide structured reports. Randomly selected radiology reports that were generated following the implementation of department-wide structured reports were evaluated for textual errors by two radiologists. For each report, the text was compared to the corresponding audio file. Errors in each report were tabulated and classified. Error rates were compared to results from a prior study performed prior to implementation of structured reports. Calculated error rates included the average number of errors per report, average number of nongrammatical errors per report, the percentage of reports with an error, and the percentage of reports with a nongrammatical error. Identical versions of voice-recognition software were used for both studies. A total of 644 radiology reports were randomly evaluated as part of this study. There was a statistically significant reduction in the percentage of reports with nongrammatical errors (33 to 26 %; p = 0.024). The likelihood of at least one missense omission error (omission errors that changed the meaning of a phrase or sentence) occurring in a report was significantly reduced from 3.5 to 1.2 % (p = 0.0175). A statistically significant reduction in the likelihood of at least one comission error (retained statements from a standardized report that contradict the dictated findings or impression) occurring in a report was also observed (3.9 to 0.8 %; p = 0.0007). Carefully constructed structured reports can help to reduce certain error types in radiology reports.  相似文献   

7.
Speech recognition systems have become increasingly popular as a means to produce radiology reports, for reasons both of efficiency and of cost. However, the suboptimal recognition accuracy of these systems can affect the productivity of the radiologists creating the text reports. We analyzed a database of over two million de-identified radiology reports to determine the strongest determinants of word frequency. Our results showed that body site and imaging modality had a similar influence on the frequency of words and of three-word phrases as did the identity of the speaker. These findings suggest that the accuracy of speech recognition systems could be significantly enhanced by further tailoring their language models to body site and imaging modality, which are readily available at the time of report creation.  相似文献   

8.
9.
10.
Several studies have been conducted to address the learning of a nonnative speech contrast in adulthood, using native speakers of Japanese and the English /r/-/l/ contrast. Japanese adults were asked to identify contrasting /r/-/l/ stimuli (e.g., "rock-lock"). An adaptive training regime starting with initially easy stimuli was contrasted with a fixed training regime using difficult stimuli, with some subjects receiving feedback on the correctness of their responses and others receiving no feedback in both conditions. After three and six sessions of training, subjects received tests assessing identification and discrimination of /r/-/l/ stimuli as well as generalization. In all cases except fixed training without feedback, subjects showed clear evidence of learning, and several indicators suggested that training affects speech perception, rather than simply auditory processes. Neuroimaging studies currently underway are examining the neural basis of these findings.  相似文献   

11.
Radiology report errors occur for many reasons including the use of pre-filled report templates, wrong-word substitution, nonsensical phrases, and missing words. Reports may also contain clinical errors that are not specific to the speech recognition including wrong laterality and gender-specific discrepancies. Our goal was to create a custom algorithm to detect potential gender and laterality mismatch errors and to notify the interpreting radiologists for rapid correction. A JavaScript algorithm was devised to flag gender and laterality mismatch errors by searching the text of the report for keywords and comparing them to parameters within the study’s HL7 metadata (i.e., procedure type, patient sex). The error detection algorithm was retrospectively applied to 82,353 reports 4 months prior to its development and then prospectively to 309,304 reports 15 months after implementation. Flagged reports were reviewed individually by two radiologists for a true gender or laterality error and to determine if the errors were ultimately corrected. There was significant improvement in the number of flagged reports (pre, 198/82,353 [0.24 %]; post, 628/309,304 [0.20 %]; P = 0.04) and reports containing confirmed gender or laterality errors (pre, 116/82,353 [0.014 %]; post, 285/309,304 [0.09 %]; P < 0.0001) after implementing our error notification system. The number of flagged reports containing an error that were ultimately corrected improved dramatically after implementing the notification system (pre, 17/116 [15 %]; post, 239/285 [84 %]; P < 0.0001). We developed a successful automated tool for detecting and notifying radiologists of potential gender and laterality errors, allowing for rapid report correction and reducing the overall rate of report errors.  相似文献   

12.
The sight of a speaker’s facial movements during the perception of a spoken message can benefit speech processing through online predictive mechanisms. Recent evidence suggests that these predictive mechanisms can operate across sensory modalities, that is, vision and audition. However, to date, behavioral and electrophysiological demonstrations of cross-modal prediction in speech have considered only the speaker’s native language. Here, we address a question of current debate, namely whether the level of representation involved in cross-modal prediction is phonological or pre-phonological. We do this by testing participants in an unfamiliar language. If cross-modal prediction is predominantly based on phonological representations tuned to the phonemic categories of the native language of the listener, then it should be more effective in the listener’s native language than in an unfamiliar one. We tested Spanish and English native speakers in an audiovisual matching paradigm that allowed us to evaluate visual-to-auditory prediction, using sentences in the participant’s native language and in an unfamiliar language. The benefits of cross-modal prediction were only seen in the native language, regardless of the particular language or participant’s linguistic background. This pattern of results implies that cross-modal visual-to-auditory prediction during speech processing makes strong use of phonological representations, rather than low-level spatiotemporal correlations across facial movements and sounds.  相似文献   

13.
14.
15.
Z Eviatar 《Neuropsychology》1999,13(4):498-515
Four experiments explored the effects of specific language characteristics on hemispheric functioning in reading nonwords using a lateralized trigram identification task. Previous research using nonsense consonant-vowel-consonant (CVC) trigrams has shown that total error scores reveal a right visual field (RVF) advantage in Hebrew, Japanese, and English. Qualitative error patterns have shown that the right hemisphere uses a sequential strategy, whereas the left hemisphere uses a more parallel strategy in English but shows the opposite pattern in Hebrew. Experiment 1 tested whether this is due to the test language or to the native language of the participants. Results showed that native language had a stronger effect on hemispheric strategies than test language. Experiment 2 showed that latency to target letters in the CVCs revealed the same asymmetry as qualitative errors for Hebrew speakers but not for English speakers and that exposure duration of the stimuli affected misses differentially according to letter position. Experiment 3 used number trigrams to equate reading conventions in the 2 languages. Qualitative error scores still revealed opposing asymmetry patterns. Experiments 1-3 used vertical presentations. Experiment 4 used horizontal presentation, which eliminated sequential processing in both hemispheres in Hebrew speakers, whereas English speakers still showed sequential processing in both hemispheres. Comparison of the 2 presentations suggests that stimulus arrangement affected qualitative errors in the left visual field but not the RVF for English speakers and in both visual fields for Hebrew speakers. It is suggested that these differences result from orthographic and morphological differences between the languages: Reading Hebrew requires attention to be deployed to all the constituents of the stimulus in parallel, whereas reading English allows sequential processing of the letters in both hemispheres. Implications of cross-language studies for models of hemispheric function are discussed.  相似文献   

16.
We investigated the influence of phonological neighbourhood density (PND) on the performance of aphasic speakers whose naming impairments differentially implicate phonological or semantic stages of lexical access. A word comes from a dense phonological neighbourhood if many words sound like it. Limited evidence suggests that higher density facilitates naming in aphasic speakers, as it does in healthy speakers. Using well-controlled stimuli, Experiment 1 confirmed the influence of PND on accuracy and phonological error rates in two aphasic speakers with phonological processing deficits. In Experiments 2 and 3, we extended the investigation to an aphasic speaker who is prone to semantic errors, indicating a semantic deficit and/or a deficit in the mapping from semantics to words. This individual had higher accuracy, and fewer semantic errors, in naming targets from high- than from low-density neighbourhoods. It is argued that the Results provide strong support for interactive approaches to lexical access, where reverberatory feedback between word- and phoneme-level lexical representations not only facilitates phonological level processes but also privileges the selection of a target word over its semantic competitors.  相似文献   

17.
Neuropsychological studies in brain-injured patients with aphasia and children with specific language-learning deficits have shown the dependence of language comprehension on auditory processing abilities, i.e. the detection of temporal order. An impairment of temporal-order perception can be simulated by time reversing segments of the speech signal. In our study, we investigated how different lengths of time-reversed segments in speech influenced comprehension in ten native German speakers and ten participants who had acquired German as a second language. Results show that native speakers were still able to understand the distorted speech at segment lengths of 50 ms, whereas non-native speakers only could identify sentences with reversed intervals of 32 ms duration. These differences in performance can be interpreted by different levels of semantic and lexical proficiency. Our method of temporally-distorted speech offers a new approach to assess language skills that indirectly taps into lexical and semantic competence of non-native speakers.  相似文献   

18.
19.
20.
English has long been the dominant language of scientific publication, and it is rapidly approaching near-complete hegemony. The majority of the scientists publishing in English-language journals are not native English speakers, however. This imbalance has important implications for training concerning ethics and enforcement of publication standards, particularly with respect to plagiarism. The authors suggest that lack of understanding of what constitutes plagiarism and the use of a linguistic support strategy known as "patchwriting" can lead to inadvertent misuse of source material by nonnative speakers writing in English as well as to unfounded accusations of intentional scientific misconduct on the part of these authors. They propose that a rational and well-informed dialogue about this issue is needed among editors, educators, administrators, and both native-English-speaking and nonnative-English-speaking writers. They offer recommendations for creating environments in which such dialogue and training can occur.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号