BoB,a best-of-breed automated text de-identification system for VHA clinical documents |
| |
Authors: | Oscar Ferrández Brett R South Shuying Shen F Jeffrey Friedlin Matthew H Samore Stéphane M Meystre |
| |
Institution: | 1.Department of Biomedical Informatics, University of Utah, Salt Lake City, Utah, USA;2.IDEAS Center, SLCVA Healthcare System, Salt Lake City, Utah, USA;3.Medical Informatics, Regenstrief Institute, Inc., Indianapolis, Indiana, USA |
| |
Abstract: | ObjectiveDe-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents.Materials and methodsWe devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible.ResultsWe evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB''s main components. Moreover, an existing text de-identification system was also included in our evaluation.DiscussionBoB''s design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives.ConclusionsOur system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact. |
| |
Keywords: | |
|
|