首页 | 本学科首页   官方微博 | 高级检索  
     


An Object‐Oriented Regression for Building Disease Predictive Models with Multiallelic HLA Genes
Authors:Lue Ping Zhao  Hamid Bolouri  Michael Zhao  Daniel E. Geraghty  Åke Lernmark  The Better Diabetes Diagnosis Study Group
Affiliation:1. Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America;2. Department of Biostatistics, University of Washington School of Public Health, Seattle, Washington, United States of America;3. Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America;4. Bellevue High School, Seattle, Washington, United States of America;5. Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America;6. Department of Clinical Sciences, Lund University/CRC, Sk?ne University Hospital, Malm?, Sweden;7. Members of the Better Diabetes Diagnosis Study Are Listed in the Appendix
Abstract:Recent genome‐wide association studies confirm that human leukocyte antigen (HLA) genes have the strongest associations with several autoimmune diseases, including type 1 diabetes (T1D), providing an impetus to reduce this genetic association to practice through an HLA‐based disease predictive model. However, conventional model‐building methods tend to be suboptimal when predictors are highly polymorphic with many rare alleles combined with complex patterns of sequence homology within and between genes. To circumvent this challenge, we describe an alternative methodology; treating complex genotypes of HLA genes as “objects” or “exemplars,” one focuses on systemic associations of disease phenotype with “objects” via similarity measurements. Conceptually, this approach assigns disease risks base on complex genotype profiles instead of specific disease‐associated genotypes or alleles. Effectively, it transforms large, discrete, and sparse HLA genotypes into a matrix of similarity‐based covariates. By the Kernel representative theorem and machine learning techniques, it uses a penalized likelihood method to select disease‐associated exemplars in building predictive models. To illustrate this methodology, we apply it to a T1D study with eight HLA genes (HLA‐DRB1, HLA‐DRB3, HLA‐DRB4, HLA‐DRB5, HLA‐DQA1, HLA‐DQB1, HLA‐DPA1, and HLA‐DPB1) to build a predictive model. The resulted predictive model has an area under curve of 0.92 in the training set, and 0.89 in the validating set, indicating that this methodology is useful to build predictive models with complex HLA genotypes.
Keywords:generalized linear model  kernel machine  multiallelic genotypes  penalized regression  prediction  similarity regression  statistical learning
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号