A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

A framework for employing longitudinally collected multicenter electronic health records to stratify heterogeneous patient populations on disease history

Authors:	Marc P Maurits Ilya Korsunsky Soumya Raychaudhuri Shawn N Murphy Jordan W Smoller Scott T Weiss Lynn M Petukhova Chunhua Weng Wei-Qi Wei Thomas W J Huizinga Marcel J T Reinders Elizabeth W Karlson Erik B van den Akker Rachel Knevel

Abstract:	ObjectiveTo facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects.Material and MethodsWe used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features.ResultsWe observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 “other headache” clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2–8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles.DiscussionCostly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data.ConclusionWe establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes.

Keywords:	electronic health records clustering electronic medical records ICD PhenoGraph eMERGE

设为首页 | 免责声明 | 关于勤云 | 加入收藏