A Comparison Study of Fixed and Mixed Effect Models for Gene Level Association Studies of Complex Traits |
| |
Authors: | Ruzong Fan Chi‐yang Chiu Jeesun Jung Daniel E. Weeks Alexander F. Wilson Joan E. Bailey‐Wilson Christopher I. Amos Zhen Chen James L. Mills Momiao Xiong |
| |
Affiliation: | 1. Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America;2. Laboratory of Epidemiology and Biometry, National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, Maryland, United States of America;3. Departments of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America;4. Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America;5. Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, United States of America;6. Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, New Hampshire, United States of America;7. Epidemiology Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, United States of America;8. Human Genetics Center, University of Texas—Houston, Houston, Texas, United States of America |
| |
Abstract: | In association studies of complex traits, fixed‐effect regression models are usually used to test for association between traits and major gene loci. In recent years, variance‐component tests based on mixed models were developed for region‐based genetic variant association tests. In the mixed models, the association is tested by a null hypothesis of zero variance via a sequence kernel association test (SKAT), its optimal unified test (SKAT‐O), and a combined sum test of rare and common variant effect (SKAT‐C). Although there are some comparison studies to evaluate the performance of mixed and fixed models, there is no systematic analysis to determine when the mixed models perform better and when the fixed models perform better. Here we evaluated, based on extensive simulations, the performance of the fixed and mixed model statistics, using genetic variants located in 3, 6, 9, 12, and 15 kb simulated regions. We compared the performance of three models: (i) mixed models that lead to SKAT, SKAT‐O, and SKAT‐C, (ii) traditional fixed‐effect additive models, and (iii) fixed‐effect functional regression models. To evaluate the type I error rates of the tests of fixed models, we generated genotype data by two methods: (i) using all variants, (ii) using only rare variants. We found that the fixed‐effect tests accurately control or have low false positive rates. We performed simulation analyses to compare power for two scenarios: (i) all causal variants are rare, (ii) some causal variants are rare and some are common. Either one or both of the fixed‐effect models performed better than or similar to the mixed models except when (1) the region sizes are 12 and 15 kb and (2) effect sizes are small. Therefore, the assumption of mixed models could be satisfied and SKAT/SKAT‐O/SKAT‐C could perform better if the number of causal variants is large and each causal variant contributes a small amount to the traits (i.e., polygenes). In major gene association studies, we argue that the fixed‐effect models perform better or similarly to mixed models in most cases because some variants should affect the traits relatively large. In practice, it makes sense to perform analysis by both the fixed and mixed effect models and to make a comparison, and this can be readily done using our R codes and the SKAT packages. |
| |
Keywords: | rare variants common variants association mapping quantitative/dichotomous trait loci complex traits functional data analysis multivariate linear models logistic regressions |
|
|