首页 | 本学科首页   官方微博 | 高级检索  
     

一种基于扩展模式集的中国人名识别方法
引用本文:栾伟锋,张欢欢. 一种基于扩展模式集的中国人名识别方法[J]. 医学教育探索, 2018, 44(3): 425-430
作者姓名:栾伟锋  张欢欢
作者单位:华东理工大学信息科学与工程学院, 上海 200237,华东理工大学信息科学与工程学院, 上海 200237
基金项目:上海市科委科研计划项目(17DZ1101003)
摘    要:由于中国人名形式复杂多样,且存在简称、别名等不规范形式,针对传统的中国人名识别方法对诸如人名简称或别名这类非完整形式中国人名识别尚不完善的问题,提出了一种基于扩展模式集的中国人名识别方法,通过扩展人名识别模式集,提高对于非完整形式的中国人名的识别效果。实验结果表明,该方法取得了较好的正确率和召回率,尤其对于非完整形式的中文人名识别取得了一定效果,促进了人名识别工作的完整性。

关 键 词:中国人名  非完整形式中国人名  角色标注  人名识别模式集
收稿时间:2017-05-09

An Expanded Pattern Set-Based Approach to Chinese Name Recognition
LUAN Wei-feng and ZHANG Huan-huan. An Expanded Pattern Set-Based Approach to Chinese Name Recognition[J]. Researches in Medical Education, 2018, 44(3): 425-430
Authors:LUAN Wei-feng and ZHANG Huan-huan
Affiliation:School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China and School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
Abstract:Named entity recognition is a foundational task in Chinese information processing. Entity identification is the extraction of proper nouns and numeric information from the text and classifies them into categories such as person, organization and location. The Chinese names appear with a higher frequency in Chinese texts, so as an important basic subject of named entity recognition, the study of Chinese names recognition can significantly improve the quality of Chinese information processing. The forms of Chinese names are complex and diverse, which can be short names, aliases and other non-standard forms of names. Since the traditional Chinese name recognition methods are not yet perfect, we propose a new recognition method based on the expanded pattern set, and improve the recognition accuracy of non-complete Chinese names by expanding the set of recognition patterns. The main idea of this method is using role labeling to achieve Chinese name recognition. Firstly, through training of the corpus, we achieve the automatic role labeling and get the role sequence of the text. The role of each word is mainly based on the different roles in the composition of a person''s name, such as family name, name, above, below, etc. Secondly, on the basis of the role sequence and the name recognition pattern set, the pattern matching algorithm is used to find the strings that match the name pattern defined by the name recognition pattern set from the text, and ultimately identify them as names. In this paper, the non-complete forms of names are fully considered, and the pattern set of name recognition is extended to adapt to more complex names. The experimental results demonstrate that the method is especially effective in recognition of non-complete Chinese names, thereby promoting the integrity of name recognition.
Keywords:Chinese name  non-complete Chinese name  role labeling  set of recognition patterns
点击此处可从《医学教育探索》浏览原始摘要信息
点击此处可从《医学教育探索》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号