首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
When genome‐wide association studies (GWAS) or sequencing studies are performed on family‐based datasets, the genotype data can be used to check the structure of putative pedigrees. Even in datasets of putatively unrelated people, close relationships can often be detected using dense single‐nucleotide polymorphism/variant (SNP/SNV) data. A number of methods for finding relationships using dense genetic data exist, but they all have certain limitations, including that they typically use average genetic sharing, which is only a subset of the available information. Here, we present a set of approaches for classifying relationships in GWAS datasets or large‐scale sequencing datasets. We first propose an empirical method for detecting identity by descent segments in close relative pairs using un‐phased dense SNP data and demonstrate how that information can assist in building a relationship classifier. We then develop a strategy to take advantage of putative pedigree information to enhance classification accuracy. Our methods are tested and illustrated with two datasets from two distinct populations. Finally, we propose classification pipelines for checking and identifying relationships in datasets containing a large number of small pedigrees.  相似文献   

2.
Ordinal responses are very common in longitudinal data collected from substance abuse research or other behavioral research. This study develops a new statistical model with free SAS macros that can be applied to characterize time‐varying effects on ordinal responses. Our simulation study shows that the ordinal‐scale time‐varying effects model has very low estimation bias and sometimes offers considerably better performance when fitting data with ordinal responses than a model that treats the response as continuous. Contrary to a common assumption that an ordinal scale with several levels can be treated as continuous, our results indicate that it is not so much the number of levels on the ordinal scale but rather the skewness of the distribution that makes a difference on relative performance of linear versus ordinal models. We use longitudinal data from a well‐known study on youth at high risk for substance abuse as a motivating example to demonstrate that the proposed model can characterize the time‐varying effect of negative peer influences on alcohol use in a way that is more consistent with the developmental theory and existing literature, in comparison with the linear time‐varying effect model. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

3.
The goal of many gene‐expression microarray profiling clinical studies is to develop a multivariate classifier to predict patient disease outcome from a gene‐expression profile measured on some biological specimen from the patient. Often some preliminary validation of the predictive power of a profile‐based classifier is carried out using the same data set that was used to derive the classifier. Techniques such as cross‐validation or bootstrapping can be used in this setting to assess predictive power, and if applied correctly, can result in a less biased estimate of predictive accuracy of a classifier. However, some investigators have attempted to apply standard statistical inference procedures to assess the statistical significance of associations between true and cross‐validated predicted outcomes. We demonstrate in this paper that naïve application of standard statistical inference procedures to these measures of association under null situations can result in greatly inflated testing type I error rates. Under alternatives of small to moderate associations, confidence interval coverage probabilities may be too low, although for very large associations coverage probabilities approach their intended values. Our results suggest that caution should be exercised in interpreting some of the claims of exceptional prognostic classifier performance that have been reported in prominent biomedical journals in the past few years. Copyright © 2006 John Wiley & Sons, Ltd.  相似文献   

4.
There is an increasing interest in using data derived from ordinal methods, particularly data derived from discrete choice experiments (DCEs), to estimate the cardinal values for health states to calculate quality adjusted life years (QALYs). Ordinal measurement strategies such as DCE may have considerable practical advantages over more conventional cardinal measurement techniques, e.g. time trade‐off (TTO), because they may not require such a high degree of abstract reasoning. However, there are a number of challenges to deriving the cardinal values for health states using ordinal data, including anchoring the values on the full health–dead scale used to calculate QALYs. This paper reports on a study that deals with these problems in the context of using two ordinal techniques, DCE and ranking, to derive the cardinal values for health states derived from a condition‐specific sexual health measure. The results were compared with values generated using a commonly used cardinal valuation technique, the TTO. This study raises some important issues about the use of ordinal data to produce cardinal health state valuations. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

5.
Many complex human diseases such as alcoholism and cancer are rated on ordinal scales. Well‐developed statistical methods for the genetic mapping of quantitative traits may not be appropriate for ordinal traits. We propose a class of variance‐component models for the joint linkage and association analysis of ordinal traits. The proposed models accommodate arbitrary pedigrees and allow covariates and gene‐environment interactions. We develop efficient likelihood‐based inference procedures under the proposed models. The maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. Extensive simulation studies demonstrate that the proposed methods perform well in practical situations. An application to data from the Collaborative Study on the Genetics of Alcoholism is provided. Genet. Epidemiol. 34: 232–237, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

6.
In many applications, especially in cancer treatment and diagnosis, investigators are interested in classifying patients into various diagnosis groups on the basis of molecular data such as gene expression or proteomic data. Often, some of the diagnosis groups are known to be related to higher or lower values of some of the predictors. The standard methods of classifying patients into various groups do not take into account the underlying order. This could potentially result in high misclasiffication rates, especially when the number of groups is larger than two. In this article, we develop classification procedures that exploit the underlying order among the mean values of the predictor variables and the diagnostic groups by using ideas from order‐restricted inference. We generalize the existing methodology on discrimination under restrictions and provide empirical evidence to demonstrate that the proposed methodology improves over the existing unrestricted methodology. The proposed methodology is applied to a bladder cancer data set where the researchers are interested in classifying patients into various groups. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

7.
This paper presents a new goodness‐of‐fit test for an ordered stereotype model used for an ordinal response variable. The proposed test is based on the well‐known Hosmer–Lemeshow test and its version for the proportional odds regression model. The latter test statistic is calculated from a grouping scheme assuming that the levels of the ordinal response are equally spaced which might be not true. One of the main advantages of the ordered stereotype model is that it allows us to determine a new uneven spacing of the ordinal response categories, dictated by the data. The proposed test takes the use of this new adjusted spacing to partition data. A simulation study shows good performance of the proposed test under a variety of scenarios. Finally, the results of the application in two examples are presented. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

8.
Generalized partial ordinal models occur frequently in biomedical investigations where, along with ordinal longitudinal outcomes, there are time‐dependent covariates that act nonparametrically. In these studies, an association between such outcomes and time to an event is of considerable interest to medical practitioners. The primary objective in the present article is to study the robustness of estimators of the parameters of interest in a joint generalized partial ordinal models and a time‐to‐event model, because in many situations, the estimators in such joint models are sensitive to outliers. A Monte Carlo Metropolis–Hastings Newton Raphson algorithm is proposed for robust estimation. A detailed simulation study was performed to justify the behavior of the proposed estimators. By way of motivation, we consider a data set concerning longitudinal outcomes of children involved in a study on muscular dystrophy. Our analysis revealed some interesting findings that may be useful to medical practitioners. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

9.
Effectively combining many classification instruments or diagnostic measurements together to improve the classification accuracy of individuals is a common idea in disease diagnosis or classification. These ensemble‐type diagnostic methods can be constructed with respect to different kinds of performance criterions. Among them, the receiver operating characteristic (ROC) curve is the most popular criterion, which, together with some indexes derived from it, is commonly used to evaluate and summarize the performance of a classification instrument, such as a biomarker or a classifier. However, the usefulness of ROC curve and its related indexes relies on the existence of a binary label for each individual subject. In many disease diagnosis situations, such a binary variable may not exist, but only the continuous measurement of the true disease status is available. This true disease status is often referred to as the ‘gold standard’. The modified area under ROC curve (AUC)‐type measure defined by Obuchowski is a method proposed to accommodate such a situation. However, there is still no method for finding the optimal combination of diagnostic measurements, with respect to such an index, to have better diagnostic power than that of each individual measurement. In this paper, we propose an algorithm for finding the optimal combination with respect to such an extended AUC‐type measure such that the combined measurement can have more diagnostic power. We illustrate the performance of our algorithm by using some synthesized data and a diabetes data set. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

10.
Health-related quality of life (HRQoL) measures are increasingly used in trials as primary outcome measures. Investigators are now asking statisticians for advice on how to plan and analyse studies using such outcomes. HRQoL outcomes, like the SF-36, are usual measured on an ordinal scale, although most investigators assume that there exists an underlying continuous latent variable and that the actual measured outcomes (the ordered categories) reflect contiguous intervals along this continuum. The ordinal scaling of HRQoL measures means they tend to generate data that have discrete, bounded and skewed distributions. Thus, standard methods of analysis that assume Normality and constant variance may not be appropriate. For this reason, conventional statistical advice would suggest non-parametric methods be used to analyse HRQoL data. The bootstrap is one such computer intensive non-parametric method for estimating sample sizes and analysing data.We describe three methods of estimating sample sizes for two-group cross-sectional comparisons of HRQoL outcomes. We then compared the power of the three methods for a two-group cross-sectional study design using bootstrap simulation. The results showed that under the location shift alternative hypothesis, conventional methods of sample size estimation performed well, particularly Whitehead's method. Whitehead's method is recommended if the HRQoL outcome has a limited number of discrete values (<7) and/or the expected proportion of cases at either of the bounds is high. If a pilot data set is readily available then bootstrap simulation will provide a more accurate and reliable estimate, than conventional methods.Finally, we used the bootstrap for hypothesis testing and the estimation of standard errors and confidence intervals for parameters, in an example data set. We then compared and contrasted the bootstrap with standard methods of analysing HRQoL outcomes. In the data set studied, with the SF-36 outcome, the use of the bootstrap for estimating sample sizes and analysing HRQoL data produces results similar to conventional statistical methods. These results suggest that bootstrap methods are not more appropriate for analysing HRQoL outcome data than standard methods.  相似文献   

11.
Receiver operating characteristic (ROC) curve and its summary statistics (e.g., the area under curve (AUC)) are commonly used to evaluate the diagnostic accuracy for disease processes with binary classification. The ROC curve has been extended to ROC surface for scenarios with three ordinal classes or to hyper‐surface for scenarios with more than three classes. For classifier under tree or umbrella ordering in which the marker measurement for one class is lower or higher than those for the other classes, the commonly adopted diagnostic measures are the naive AUC (NAUC) based on a pooled class of all the unordered classes and the umbrella volume (UV) based on the concept of volume under surface. However, both NAUC and UV have some limitations. For example, NAUC depends on the sampling weights for all the classes in population, and UV has only been introduced for three‐class settings. In this article, we initiate the idea of a new ROC framework for tree or umbrella ordering (denoted as TROC) and propose the area under TROC curve (denoted as TAUC) as an appropriate diagnostic measure. The proposed TROC and TAUC share many nice features with the traditional ROC and AUC. Both parametric and nonparametric approaches are explored to construct the confidence interval estimation of TAUC. The performances of these methods are compared in simulation studies under a variety settings. At the end, the proposed methods are applied to a published microarray data set. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

12.
Rating scales are common for self‐assessments of qualitative variables and also for expert‐rating of the severity of disability, outcomes, etc. Scale assessments and other ordered classifications generate ordinal data having rank‐invariant properties only. Hence, statistical methods are often based on ranks. The aim is to focus at the differences in ranking approaches between measures of association and of disagreement in paired ordinal data. The Spearman correlation coefficient is a measure of association between two variables, when each data set is transformed to ranks. The augmented ranking approach to evaluate disagreement takes account of the information given by the pairs of data, and provides identification and measures of systematic disagreement, when present, separately from measures of additional individual variability in assessments. The two approaches were applied to empirical data regarding relationship between perceived pain and physical health and reliability in pain assessments made by patients. The art of disagreement between the patients' perceived levels of outcome after treatment and the doctor's criterion‐based scoring was also evaluated. The comprehensive evaluation of observed disagreement in terms of systematic and individual disagreement provides valuable interpretable information of their sources. The presence of systematic disagreement can be adjusted for and/or understood. Large individual variability could be a sign of poor quality of a scale or heterogeneity among raters. It was also demonstrated that a measure of association must not be used as a measure of agreement, even though such misuse of correlation coefficients is common. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

13.
With the advance of high‐throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high‐dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high‐dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity‐based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small‐scale sequencing data without need for small‐sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.  相似文献   

14.
The power of a chi‐square test, and thus the required sample size, are a function of the noncentrality parameter that can be obtained as the limiting expectation of the test statistic under an alternative hypothesis specification. Herein, we apply this principle to derive simple expressions for two tests that are commonly applied to discrete ordinal data. The Wilcoxon rank sum test for the equality of distributions in two groups is algebraically equivalent to the Mann–Whitney test. The Kruskal–Wallis test applies to multiple groups. These tests are equivalent to a Cochran–Mantel–Haenszel mean score test using rank scores for a set of C‐discrete categories. Although various authors have assessed the power function of the Wilcoxon and Mann–Whitney tests, herein it is shown that the power of these tests with discrete observations, that is, with tied ranks, is readily provided by the power function of the corresponding Cochran–Mantel–Haenszel mean scores test for two and R > 2 groups. These expressions yield results virtually identical to those derived previously for rank scores and also apply to other score functions. The Cochran–Armitage test for trend assesses whether there is an monotonically increasing or decreasing trend in the proportions with a positive outcome or response over the C‐ordered categories of an ordinal independent variable, for example, dose. Herein, it is shown that the power of the test is a function of the slope of the response probabilities over the ordinal scores assigned to the groups that yields simple expressions for the power of the test. Copyright © 2011 John Wiley & Sons, Ltd.  相似文献   

15.
In many medical studies, researchers widely use composite or long ordinal scores, that is, scores that have a large number of categories and a natural ordering often resulting from the sum of a number of short ordinal scores, to assess function or quality of life. Typically, we analyse these using unjustified assumptions of normality for the outcome measure, which are unlikely to be even approximately true. Scores of this type are better analysed using methods reserved for more conventional (short) ordinal scores, such as the proportional‐odds model. We can avoid the need for a large number of cut‐point parameters that define the divisions between the score categories for long ordinal scores in the proportional‐odds model by the inclusion of orthogonal polynomial contrasts. We introduce the repeated measures proportional‐odds logistic regression model and describe for long ordinal outcomes modifications to the generalized estimating equation methodology used for parameter estimation. We introduce data from a trial assessing two surgical interventions, briefly describe and re‐analyse these using the new model and compare inferences from the new analysis with previously published results for the primary outcome measure (hip function at 12 months postoperatively). We use a simulation study to illustrate how this model also has more general application for conventional short ordinal scores, to select amongst competing models of varying complexity for the cut‐point parameters. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

16.
As a geographical cluster detection analysis tool, the spatial scan statistic has been developed for different types of data such as Bernoulli, Poisson, ordinal, exponential and normal. Another interesting data type is multinomial. For example, one may want to find clusters where the disease‐type distribution is statistically significantly different from the rest of the study region when there are different types of disease. In this paper, we propose a spatial scan statistic for such data, which is useful for geographical cluster detection analysis for categorical data without any intrinsic order information. The proposed method is applied to meningitis data consisting of five different disease categories to identify areas with distinct disease‐type patterns in two counties in the U.K. The performance of the method is evaluated through a simulation study. Copyright © 2010 John Wiley & Sons, Ltd.  相似文献   

17.
Health status and outcomes are frequently measured on an ordinal scale. For high-throughput genomic datasets, the common approach to analyzing ordinal response data has been to break the problem into one or more dichotomous response analyses. This dichotomous response approach does not make use of all available data and therefore leads to loss of power and increases the number of type I errors. Herein we describe an innovative frequentist approach that combines two statistical techniques, L(1) penalization and continuation ratio models, for modeling an ordinal response using gene expression microarray data. We conducted a simulation study to assess the performance of two computational approaches and two model selection criteria for fitting frequentist L(1) penalized continuation ratio models. Moreover, we empirically compared the approaches using three application datasets, each of which seeks to classify an ordinal class using microarray gene expression data as the predictor variables. We conclude that the L(1) penalized constrained continuation ratio model is a useful approach for modeling an ordinal response for datasets where the number of covariates (p) exceeds the sample size (n) and the decision of whether to use Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC) for selecting the final model should depend upon the similarities between the pathologies underlying the disease states to be classified.  相似文献   

18.
A non-parametric multi-dimensional isotonic regression estimator is developed for use in estimating a set of target quantiles from an ordinal toxicity scale. We compare this estimator to the standard parametric maximum likelihood estimator from a proportional odds model for extremely small data sets. A motivating example is from phase I oncology clinical trials, where various non-parametric designs have been proposed that lead to very small data sets, often with ordinal toxicity response data. Our comparison of estimators is performed in conjunction with three of these non-parametric sequential designs for ordinal response data, two from the literature and a new design based on a random walk rule. We also compare with a non-parametric design for binary response trials, by keeping track of ordinal data for estimation purposes, but dichotomizing the data in the design phase. We find that a multidimensional isotonic regression-based estimator far exceeds the others in terms of accuracy and efficiency. A rule by Simon et al. (J. Natl. Cancer Inst. 1997; 89:1138-1147) yields particularly efficient estimators, more so than the random walk rule, but has higher numbers of dose-limiting toxicity. A small data set from a leukemia clinical trial is analysed using our multidimensional isotonic regression-based estimator.  相似文献   

19.
Adolescent alcohol use is a serious public health concern. Despite advances in the theoretical conceptualization of pathways to alcohol use, researchers are limited by the statistical techniques currently available. Researchers often fit linear models and restrictive categorical models (e.g., proportional odds models) to ordinal data with many response categories defined by collapsed count data (0 drinking days, 1–2days, 3–6days, etc.). Consequently, existing models ignore the underlying count process, resulting in disjoint between the construct of interest and the models being fitted. Our proposed ordinal modeling approach overcomes this limitation by explicitly linking ordinal responses to a suitable underlying count distribution. In doing so, researchers can use maximum likelihood estimation to fit count models to ordinal data as if they had directly observed the underlying discrete counts. The usefulness of our ordinal negative binomial and ordinal zero‐inflated negative binomial models is verified by simulation studies. We also demonstrate our approach using real empirical data from the 2010 National Survey of Drug Use and Health. Results show the benefit of the proposed ordinal modeling framework compared with existing methods. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

20.
We present a model‐based approach to the analysis of agreement between different raters in a situation where all raters have supplied ordinal ratings of the same cases in a sample. It is assumed that no “gold standard” is available. The model is an ordinal regression model with random effects—a so‐called rating scale model. The model includes case‐specific parameters that allow each case his or hers own level (disease severity). It also allows raters to have different propensities to score a given set of individuals more or less positively—the rater level. Based on the model, we suggest quantifying the rater variation using the median odds ratio. This allows expressing the variation on the same scale as the observed ordinal data. An important example that will serve to motivate and illustrate the proposed model is the study of breast cancer diagnosis based on screening mammograms. The purpose of the assessment is to detect early breast cancer in order to obtain improved cancer survival. In the study, mammograms from 148 women were evaluated by 110 expert radiologists. The experts were asked to rate each mammogram on a 5‐point scale ranging from “normal” to “probably malignant.”  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号