GWAS with longitudinal phenotypes: performance of approximate procedures |
| |
Authors: | Karolina Sikorska Nahid Mostafavi Montazeri André Uitterlinden Fernando Rivadeneira Paul HC Eilers Emmanuel Lesaffre |
| |
Affiliation: | 1.Department of Biostatistics, Erasmus MC, Rotterdam, The Netherlands;2.Department of Internal Medicine and Genetic Epidemiology, Erasmus MC, Rotterdam, The Netherlands;3.Department of Environmental Epidemiology, Institute for Risk Assessment Sciences, University of Utrecht, Utrecht, The Netherlands;4.Department of Public Health and Primary Care, L-Biostat, KU Leuven, Leuven, Belgium |
| |
Abstract: | Analysis of genome-wide association studies with longitudinal data using standard procedures, such as linear mixed model (LMM) fitting, leads to discouragingly long computation times. There is a need to speed up the computations significantly. In our previous work (Sikorska et al: Fast linear mixed model computations for genome-wide association studies with longitudinal data. Stat Med 2012; 32.1: 165–180), we proposed the conditional two-step (CTS) approach as a fast method providing an approximation to the P-value for the longitudinal single-nucleotide polymorphism (SNP) effect. In the first step a reduced conditional LMM is fit, omitting all the SNP terms. In the second step, the estimated random slopes are regressed on SNPs. The CTS has been applied to the bone mineral density data from the Rotterdam Study and proved to work very well even in unbalanced situations. In another article (Sikorska et al: GWAS on your notebook: fast semi-parallel linear and logistic regression for genome-wide association studies. BMC Bioinformatics 2013; 14: 166), we suggested semi-parallel computations, greatly speeding up fitting many linear regressions. Combining CTS with fast linear regression reduces the computation time from several weeks to a few minutes on a single computer. Here, we explore further the properties of the CTS both analytically and by simulations. We investigate the performance of our proposal in comparison with a related but different approach, the two-step procedure. It is analytically shown that for the balanced case, under mild assumptions, the P-value provided by the CTS is the same as from the LMM. For unbalanced data and in realistic situations, simulations show that the CTS method does not inflate the type I error rate and implies only a minimal loss of power. |
| |
Keywords: | |
|
|