首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In genetic association studies, multiple markers are usually employed to cover a genomic region of interest for localizing a trait locus. In this report, we propose a novel multi-marker family-based association test (T(LC)) that linearly combines the single-marker test statistics using data-driven weights. We examine the type-I error rate in a numerical study and compare its power to identify a common trait locus using tag single nucleotide polymorphisms (SNPs) within the same haplotype block that the trait locus resides with three competing tests including a global haplotype test (T(H)), a multi-marker test similar to the Hotelling-T(2) test for the population-based data (T(MM)), and a single-marker test with Bonferroni's correction for multiple testing (T(B)). The type-I error rate of T(LC) is well maintained in our numeric study. In all the scenarios we examined, T(LC) is the most powerful, followed by T(B). T(MM) and T(H) are the poorest. T(H) and T(MM) have essentially the same power when parents are available. However, when both parents are missing, T(MM) is substantially more powerful than T(H). We also apply this new test on a data set from a previous association study on nicotine dependence.  相似文献   

2.
Pooling data from multiple studies improves estimation of exposure-disease associations through increased sample size. However, biomarker exposure measurements can vary substantially across laboratories and often require calibration to a reference assay prior to pooling. We develop two statistical methods for aggregating biomarker data from multiple studies: the full calibration method and the internalized method. The full calibration method calibrates all biomarker measurements regardless of the availability of reference laboratory measurements while the internalized method calibrates only non-reference laboratory measurements. We compare the performance of these two aggregation methods to two-stage methods. Furthermore, we compare the aggregated and two-stage methods when estimating the calibration curve from controls only or from a random sample of individuals from the study cohort. Our findings include the following: (1) Under random sampling for calibration, exposure effect estimates from the internalized method have a smaller mean squared error than those from the full calibration method. (2) Under the controls-only calibration design, the full calibration method yields effect estimates with the least bias. (3) The two-stage approaches produce average effect estimates that are similar to the full calibration method under a controls only calibration design and the internalized method under a random sample calibration design. We illustrate the methods in an application evaluating the relationship between circulating vitamin D levels and stroke risk in a pooling project of cohort studies.  相似文献   

3.
We propose an innovative and practically relevant clustering method to find common task‐related brain regions among different subjects who respond to the same set of stimuli. Using functional magnetic resonance imaging (fMRI) time series data, we first cluster the voxels within each subject on a voxel by voxel basis. To extract signals out of noisy data, we estimate a new periodogram at each voxel using multi‐tapering and low‐rank spline smoothing and then use the periodogram as the main feature for clustering. We apply a divisive hierarchical clustering algorithm to the estimated periodograms within a single subject and identify the task‐related region as the cluster of voxels that have periodograms with a peak frequency matching that of the stimulus sequence. Finally, we apply a machine learning technique called clustering ensemble to find common task‐related regions across different subjects. The efficacy of the proposed approach is illustrated via a simulation study and a real fMRI data set. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
Multivariate meta‐analysis, which involves jointly analyzing multiple and correlated outcomes from separate studies, has received a great deal of attention. One reason to prefer the multivariate approach is its ability to account for the dependence between multiple estimates from the same study. However, nearly all the existing methods for analyzing multivariate meta‐analytic data require the knowledge of the within‐study correlations, which are usually unavailable in practice. We propose a simple non‐iterative method that can be used for the analysis of multivariate meta‐analysis datasets, that has no convergence problems, and does not require the use of within‐study correlations. Our approach uses standard univariate methods for the marginal effects but also provides valid joint inference for multiple parameters. The proposed method can directly handle missing outcomes under missing completely at random assumption. Simulation studies show that the proposed method provides unbiased estimates, well‐estimated standard errors, and confidence intervals with good coverage probability. Furthermore, the proposed method is found to maintain high relative efficiency compared with conventional multivariate meta‐analyses where the within‐study correlations are known. We illustrate the proposed method through two real meta‐analyses where functions of the estimated effects are of interest. © 2015 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

5.
Genome‐wide association studies (GWAS) for complex diseases have focused primarily on single‐trait analyses for disease status and disease‐related quantitative traits. For example, GWAS on risk factors for coronary artery disease analyze genetic associations of plasma lipids such as total cholesterol, LDL‐cholesterol, HDL‐cholesterol, and triglycerides (TGs) separately. However, traits are often correlated and a joint analysis may yield increased statistical power for association over multiple univariate analyses. Recently several multivariate methods have been proposed that require individual‐level data. Here, we develop metaUSAT (where USAT is unified score‐based association test), a novel unified association test of a single genetic variant with multiple traits that uses only summary statistics from existing GWAS. Although the existing methods either perform well when most correlated traits are affected by the genetic variant in the same direction or are powerful when only a few of the correlated traits are associated, metaUSAT is designed to be robust to the association structure of correlated traits. metaUSAT does not require individual‐level data and can test genetic associations of categorical and/or continuous traits. One can also use metaUSAT to analyze a single trait over multiple studies, appropriately accounting for overlapping samples, if any. metaUSAT provides an approximate asymptotic P‐value for association and is computationally efficient for implementation at a genome‐wide level. Simulation experiments show that metaUSAT maintains proper type‐I error at low error levels. It has similar and sometimes greater power to detect association across a wide array of scenarios compared to existing methods, which are usually powerful for some specific association scenarios only. When applied to plasma lipids summary data from the METSIM and the T2D‐GENES studies, metaUSAT detected genome‐wide significant loci beyond the ones identified by univariate analyses. Evidence from larger studies suggest that the variants additionally detected by our test are, indeed, associated with lipid levels in humans. In summary, metaUSAT can provide novel insights into the genetic architecture of a common disease or traits.  相似文献   

6.
To date, despite widespread availability of time series data on multiple syndromes, multivariate analysis of syndromic data remains under-explored. We present a non-parametric multivariate framework for early detection of temporal anomalies based on principal components analysis of historical data on multiple syndromes. We introduce simulated outbreaks of different shapes and magnitudes into the historical data, and compare the detection sensitivity and timeliness of the multi-syndrome detection method with those of uni-syndrome. We find that the multi-syndrome detection framework provides a powerful tool for identifying such designated abnormalities in the data and significantly improves upon the detection sensitivity and timeliness of uni-syndrome analysis. The proposed multivariate framework requires minimal preprocessing of the data and can be easily adopted in settings where temporal information on multiple syndromes are routinely collected and processed, and thus can be an integral component of real-time surveillance systems.  相似文献   

7.
Analysis of clustered data focusing on inference of the marginal distribution may be problematic when the risk of the outcome is related to the cluster size, termed as informative cluster size. In the absence of censoring, Hoffman et al. proposed a within-cluster resampling method, which is asymptotically equivalent to a weighted generalized estimating equations score equation. We investigate the estimation of the marginal distribution for multivariate survival data with informative cluster size using cluster-weighted Weibull and Cox proportional hazards models. The cluster-weighted Cox model can be implemented using standard software. Simulation results demonstrate that the proposed methods produce unbiased parameter estimation in the presence of informative cluster size. To illustrate the proposed approach, we analyze survival data from a lymphatic filariasis study in Recife, Brazil.  相似文献   

8.
Batch bias has been found in many microarray gene expression studies that involve multiple batches of samples. A serious batch effect can alter not only the distribution of individual genes but also the inter‐gene relationships. Even though some efforts have been made to remove such bias, there has been relatively less development on a multivariate approach, mainly because of the analytical difficulty due to the high‐dimensional nature of gene expression data. We propose a multivariate batch adjustment method that effectively eliminates inter‐gene batch effects. The proposed method utilizes high‐dimensional sparse covariance estimation based on a factor model and a hard thresholding. Another important aspect of the proposed method is that if it is known that one of the batches is produced in a superior condition, the other batches can be adjusted so that they resemble the target batch. We study high‐dimensional asymptotic properties of the proposed estimator and compare the performance of the proposed method with some popular existing methods with simulated data and gene expression data sets. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

9.
This paper describes how penalized Cox regression, in combination with cross-validated partial likelihood can be employed to obtain reliable survival prediction models for high dimensional microarray data. The suggested procedure is demonstrated on a breast cancer survival data set consisting of 295 tumours as collected in the National Cancer Institute in Amsterdam and previously reported in more general papers. The main aim of this paper it to show how generally accepted biostatistical procedures can be employed to analyse high-dimensional data.  相似文献   

10.
Many critical questions in medicine require the analysis of complex multivariate data, often from large data sets describing numerous variables for numerous subjects. In this paper, we describe CoPlot, a tool for visualizing multivariate data in medicine. CoPlot is an adaptation of multidimensional scaling (MDS) that addresses several key limitations of MDS, namely that MDS maps do not allow for visualization of both observations and variables simultaneously and that the axes on an MDS map have no inherent meaning. By addressing these issues, CoPlot facilitates rich interpretation of multivariate data. We present an example using CoPlot on a recently published data set from a systematic review describing clinical features and disease progression of children with anthrax and provide recommendations for the use of CoPlot for evaluating and interpreting other healthcare data sets.  相似文献   

11.
In the past decade, many genome‐wide association studies (GWASs) have been conducted to explore association of single nucleotide polymorphisms (SNPs) with complex diseases using a case‐control design. These GWASs not only collect information on the disease status (primary phenotype, D) and the SNPs (genotypes, X), but also collect extensive data on several risk factors and traits. Recent literature and grant proposals point toward a trend in reusing existing large case‐control data for exploring genetic associations of some additional traits (secondary phenotypes, Y ) collected during the study. These secondary phenotypes may be correlated, and a proper analysis warrants a multivariate approach. Commonly used multivariate methods are not equipped to properly account for the non‐random sampling scheme. Current ad hoc practices include analyses without any adjustment, and analyses with D adjusted as a covariate. Our theoretical and empirical studies suggest that the type I error for testing genetic association of secondary traits can be substantial when X as well as Y are associated with D, even when there is no association between X and Y in the underlying (target) population. Whether using D as a covariate helps maintain type I error depends heavily on the disease mechanism and the underlying causal structure (which is often unknown). To avoid grossly incorrect inference, we have proposed proportional odds model adjusted for propensity score (POM‐PS). It uses a proportional odds logistic regression of X on Y and adjusts estimated conditional probability of being diseased as a covariate. We demonstrate the validity and advantage of POM‐PS, and compare to some existing methods in extensive simulation experiments mimicking plausible scenarios of dependency among Y , X, and D. Finally, we use POM‐PS to jointly analyze four adiposity traits using a type 2 diabetes (T2D) case‐control sample from the population‐based Metabolic Syndrome in Men (METSIM) study. Only POM‐PS analysis of the T2D case‐control sample seems to provide valid association signals.  相似文献   

12.
This paper is motivated by combining serial neurocognitive assessments and other clinical variables for monitoring the progression of Alzheimer's disease (AD). We propose a novel framework for the use of multiple longitudinal neurocognitive markers to predict the progression of AD. The conventional joint modeling longitudinal and survival data approach is not applicable when there is a large number of longitudinal outcomes. We introduce various approaches based on functional principal component for dimension reduction and feature extraction from multiple longitudinal outcomes. We use these features to extrapolate the health outcome trajectories and use scores on these features as predictors in a Cox proportional hazards model to conduct predictions over time. We propose a personalized dynamic prediction framework that can be updated as new observations collected to reflect the patient's latest prognosis, and thus intervention could be initiated in a timely manner. Simulation studies and application to the Alzheimer's Disease Neuroimaging Initiative dataset demonstrate the robustness of the method for the prediction of future health outcomes and risks of target events under various scenarios.  相似文献   

13.
Noncoding DNA contains gene regulatory elements that alter gene expression, and the function of these elements can be modified by genetic variation. Massively parallel reporter assays (MPRA) enable high-throughput identification and characterization of functional genetic variants, but the statistical methods to identify allelic effects in MPRA data have not been fully developed. In this study, we demonstrate how the baseline allelic imbalance in MPRA libraries can produce biased results, and we propose a novel, nonparametric, adaptive testing method that is robust to this bias. We compare the performance of this method with other commonly used methods, and we demonstrate that our novel adaptive method controls Type I error in a wide range of scenarios while maintaining excellent power. We have implemented these tests along with routines for simulating MPRA data in the Analysis Toolset for MPRA (@MPRA), an R package for the design and analyses of MPRA experiments. It is publicly available at http://github.com/redaq/atMPRA .  相似文献   

14.
We propose a propensity score-based multiple imputation (MI) method to tackle incomplete missing data resulting from drop-outs and/or intermittent skipped visits in longitudinal clinical trials with binary responses. The estimation and inferential properties of the proposed method are contrasted via simulation with those of the commonly used complete-case (CC) and generalized estimating equations (GEE) methods. Three key results are noted. First, if data are missing completely at random, MI can be notably more efficient than the CC and GEE methods. Second, with small samples, GEE often fails due to 'convergence problems', but MI is free of that problem. Finally, if the data are missing at random, while the CC and GEE methods yield results with moderate to large bias, MI generally yields results with negligible bias. A numerical example with real data is provided for illustration.  相似文献   

15.
A surrogate endpoint in a randomized clinical trial is an endpoint that occurs after randomization and before the true, clinically meaningful, endpoint that yields conclusions about the effect of treatment on true endpoint. A surrogate endpoint can accelerate the evaluation of new treatments but at the risk of misleading conclusions. Therefore, criteria are needed for deciding whether to use a surrogate endpoint in a new trial. For the meta‐analytic setting of multiple previous trials, each with the same pair of surrogate and true endpoints, this article formulates 5 criteria for using a surrogate endpoint in a new trial to predict the effect of treatment on the true endpoint in the new trial. The first 2 criteria, which are easily computed from a zero‐intercept linear random effects model, involve statistical considerations: an acceptable sample size multiplier and an acceptable prediction separation score. The remaining 3 criteria involve clinical and biological considerations: similarity of biological mechanisms of treatments between the new trial and previous trials, similarity of secondary treatments following the surrogate endpoint between the new trial and previous trials, and a negligible risk of harmful side effects arising after the observation of the surrogate endpoint in the new trial. These 5 criteria constitute an appropriately high bar for using a surrogate endpoint to make a definitive treatment recommendation.  相似文献   

16.
Multi‐state models of chronic disease are becoming increasingly important in medical research to describe the progression of complicated diseases. However, studies seldom observe health outcomes over long time periods. Therefore, current clinical research focuses on the secondary data analysis of the published literature to estimate a single transition probability within the entire model. Unfortunately, there are many difficulties when using secondary data, especially since the states and transitions of published studies may not be consistent with the proposed multi‐state model. Early approaches to reconciling published studies with the theoretical framework of a multi‐state model have been limited to data available as cumulative counts of progression. This paper presents an approach that allows the use of published regression data in a multi‐state model when the published study may have ignored intermediary states in the multi‐state model. Colloquially, we call this approach the Lemonade Method since when study data give you lemons, make lemonade. The approach uses maximum likelihood estimation. An example is provided for the progression of heart disease in people with diabetes. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

17.
Data augmentation has been commonly utilized to analyze correlated binary data using multivariate probit models in Bayesian analysis. However, the identification issue in the multivariate probit models necessitates a rigorous Metropolis-Hastings algorithm for sampling a correlation matrix, which may cause slow convergence and inefficiency of Markov chains. It is well-known that the parameter-expanded data augmentation, by introducing a working/artificial parameter or parameter vector, makes an identifiable model be non-identifiable and improves the mixing and convergence of data augmentation components. Therefore, we motivate to develop efficient parameter-expanded data augmentations to analyze correlated binary data using multivariate probit models. We investigate both the identifiable and non-identifiable multivariate probit models and develop the corresponding parameter-expanded data augmentation algorithms. We point out that the approaches, based on one non-identifiable model, circumvent a Metropolis-Hastings algorithm for sampling a correlation matrix and improve the convergence and mixing of correlation parameters; the identifiable model may produce the estimated regression parameters with smaller standard errors than the non-identifiable model does. We illustrate our proposed approaches using simulation studies and through the application to a longitudinal dataset from the Six Cities study.  相似文献   

18.
Allelic expression (AE) imbalance between the two alleles of a gene can be used to detect cis‐acting regulatory SNPs (rSNPs) in individuals heterozygous for a transcribed SNP (tSNP). In this paper, we propose three tests for AE analysis focusing on phase‐unknown data and any degree of linkage disequilibrium (LD) between the rSNP and tSNP: a test based on the minimum P‐value of a one‐sided F test and a two‐sided t test (proposed previously for phase‐unknown data), a test the combines the F and t tests, and a mixture‐model‐based test. We compare these three tests to the F and t tests and an existing regression‐based test for phase‐known data. We show that the ranking of the tests based on power depends most strongly on the magnitude of the LD between the rSNP and tSNP. For phase‐unknown data, we find that under a range of scenarios, our proposed tests have higher power than the F and t tests when LD between the rSNP and tSNP is moderate (~0.2<<~0.8). We further demonstrate that the presence of a second ungenotyped rSNP almost never invalidates the proposed tests nor substantially changes their power rankings. For detection of cis‐acting regulatory SNPs using phase‐unknown AE data, we recommend the F test when the rSNP and tSNP are in or near linkage equilibrium (<0.2); the t test when the two SNPs are in strong LD (<0.7); and the mixture‐model‐based test for intermediate LD levels (0.2<<0.7). Genet. Epidemiol. 2011. © 2011 Wiley‐Liss, Inc. 35: 515‐525, 2011  相似文献   

19.
When several treatments are available for evaluation in a clinical trial, different design options are available. We compare multi‐arm multi‐stage with factorial designs, and in particular, we will consider a 2 × 2 factorial design, where groups of patients will either take treatments A, B, both or neither. We investigate the performance and characteristics of both types of designs under different scenarios and compare them using both theory and simulations. For the factorial designs, we construct appropriate test statistics to test the hypothesis of no treatment effect against the control group with overall control of the type I error. We study the effect of the choice of the allocation ratios on the critical value and sample size requirements for a target power. We also study how the possibility of an interaction between the two treatments A and B affects type I and type II errors when testing for significance of each of the treatment effects. We present both simulation results and a case study on an osteoarthritis clinical trial. We discover that in an optimal factorial design in terms of minimising the associated critical value, the corresponding allocation ratios differ substantially to those of a balanced design. We also find evidence of potentially big losses in power in factorial designs for moderate deviations from the study design assumptions and little gain compared with multi‐arm multi‐stage designs when the assumptions hold. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.  相似文献   

20.
Researchers collected multiple measurements on patients with schizophrenia and their relatives, as well as control subjects and their relatives, to study vulnerability factors for schizophrenics and their near relatives. Observations across individuals from the same family are correlated, and also the multiple outcome measures on the same individuals are correlated. Traditional data analyses model outcomes separately and thus do not provide information about the interrelationships among outcomes. We propose a novel Bayesian family factor model (BFFM), which extends the classical confirmatory factor analysis model to explain the correlations among observed variables using a combination of family‐member and outcome factors. Traditional methods for fitting confirmatory factor analysis models, such as full‐information maximum likelihood (FIML) estimation using quasi‐Newton optimization (QNO), can have convergence problems and Heywood cases (lack of convergence) caused by empirical underidentification. In contrast, modern Bayesian Markov chain Monte Carlo handles these inference problems easily. Simulations compare the BFFM to FIML‐QNO in settings where the true covariance matrix is identified, close to not identified, and not identified. For these settings, FIML‐QNO fails to fit the data in 13%, 57%, and 85% of the cases, respectively, while MCMC provides stable estimates. When both methods successfully fit the data, estimates from the BFFM have smaller variances and comparable mean‐squared errors. We illustrate the BFFM by analyzing data on data from schizophrenics and their family members.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号