首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Often a "disease" or "state of disease" is defined by a subdomain of a continuous outcome variable. For example, the subdomain of diastolic blood pressure greater than 90 mmHg has been used to define hypertension. The classical method of estimating the risk (or prevalence) of such defined disease states is to dichotomize the outcome variable according to the cutoff value. The standard statistical analysis of such risk of disease then exploits methods developed specifically for binary data, usually based on the binomial distribution. We present a method, based on the assumption of a Gaussian (normal) distribution for the continuous outcome, which does not resort to dichotomization. Specifically, the estimation of risk and its variance is presented for the one- and two-sample situations, with the latter focusing on risk differences and ratios, and odds ratios. The binomial approach applied to the dichotomized data is found to be less efficient than the proposed method by 67% or less. The latter is found to be very accurate, even for small sample sizes, although rather sensitive to substitutions of the underlying distribution by thicker tailed distributions. Canadian total cholesterol data are used to illustrate the problem. For the one-sample case, the approach is illustrated using data from a study of the arterial oxygenation of 20 patients during one-lung anesthesia for thoracic surgery. For the two-sample case, data from a prognostic study of the renal function of 87 lupus nephritic patients are used.  相似文献   

2.
Misclassification is a long‐standing statistical problem in epidemiology. In many real studies, either an exposure or a response variable or both may be misclassified. As such, potential threats to the validity of the analytic results (e.g., estimates of odds ratios) that stem from misclassification are widely discussed in the literature. Much of the discussion has been restricted to the nondifferential case, in which misclassification rates for a particular variable are assumed not to depend on other variables. However, complex differential misclassification patterns are common in practice, as we illustrate here using bacterial vaginosis and Trichomoniasis data from the HIV Epidemiology Research Study (HERS). Therefore, clear illustrations of valid and accessible methods that deal with complex misclassification are still in high demand. We formulate a maximum likelihood (ML) framework that allows flexible modeling of misclassification in both the response and a key binary exposure variable, while adjusting for other covariates via logistic regression. The approach emphasizes the use of internal validation data in order to evaluate the underlying misclassification mechanisms. Data‐driven simulations show that the proposed ML analysis outperforms less flexible approaches that fail to appropriately account for complex misclassification patterns. The value and validity of the method are further demonstrated through a comprehensive analysis of the HERS example data. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

3.
Kim I  Pang H  Zhao H 《Statistics in medicine》2012,31(15):1633-1651
Many statistical methods for microarray data analysis consider one gene at a time, and they may miss subtle changes at the single gene level. This limitation may be overcome by considering a set of genes simultaneously where the gene sets are derived from prior biological knowledge. Limited work has been carried out in the regression setting to study the effects of clinical covariates and expression levels of genes in a pathway either on a continuous or on a binary clinical outcome. Hence, we propose a Bayesian approach for identifying pathways related to both types of outcomes. We compare our Bayesian approaches with a likelihood‐based approach that was developed by relating a least squares kernel machine for nonparametric pathway effect with a restricted maximum likelihood for variance components. Unlike the likelihood‐based approach, the Bayesian approach allows us to directly estimate all parameters and pathway effects. It can incorporate prior knowledge into Bayesian hierarchical model formulation and makes inference by using the posterior samples without asymptotic theory. We consider several kernels (Gaussian, polynomial, and neural network kernels) to characterize gene expression effects in a pathway on clinical outcomes. Our simulation results suggest that the Bayesian approach has more accurate coverage probability than the likelihood‐based approach, and this is especially so when the sample size is small compared with the number of genes being studied in a pathway. We demonstrate the usefulness of our approaches through its applications to a type II diabetes mellitus data set. Our approaches can also be applied to other settings where a large number of strongly correlated predictors are present. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

4.
5.
Earlier work showed how to perform fixed-effects meta-analysis of studies or trials when each provides results on more than one outcome per patient and these multiple outcomes are correlated. That fixed-effects generalized-least-squares approach analyzes the multiple outcomes jointly within a single model, and it can include covariates, such as duration of therapy or quality of trial, that may explain observed heterogeneity of results among the trials. Sometimes the covariates explain all the heterogeneity, and the fixed-effects regression model is appropriate. However, unexplained heterogeneity may often remain, even after taking into account known or suspected covariates. Because fixed-effects models do not make allowance for this remaining unexplained heterogeneity, the potential exists for bias in estimated coefficients, standard errors and p-values. We propose two random-effects approaches for the regression meta-analysis of multiple correlated outcomes. We compare their use with fixed-effects models and with separate-outcomes models in a meta-analysis of periodontal clinical trials. A simulation study shows the advantages of the random-effects approach. These methods also facilitate meta-analysis of trials that compare more than two treatments. © 1998 John Wiley & Sons, Ltd.  相似文献   

6.
In many biomedical studies, interest is often attached to calculating effect measures in the presence of interactions between two continuous exposures. Traditional approaches based on parametric regression are limited by the degree of arbitrariness involved in transforming these exposures into categorical variables or imposing a parametric form on the regression function. In this paper, we present: (a) a flexible non-parametric method for estimating effect measures through generalized additive models including interactions; and (b) bootstrap techniques for (i) testing the significance of interaction terms, and (ii) constructing confidence intervals for effect measures. The validity of our methodology is supported by simulations, and illustrated using data from a study of possible risk factors for post-operative infection. This application revealed a hitherto unreported effect: for patients with high plasma glucose levels, increased risk is associated, not only with low, but also with high percentages of lymphocytes.  相似文献   

7.
When research interest lies in continuous outcome variables that take on values within a known range (e.g. a visual analog scale for pain within 0 and 100 mm), the traditional statistical methods, such as least‐squares regression, mixed‐effects models, and even classic nonparametric methods such as the Wilcoxon's test, may prove inadequate. Frequency distributions of bounded outcomes are often unimodal, U‐shaped, and J‐shaped. To the best of our knowledge, in the biomedical and epidemiological literature bounded outcomes have seldom been analyzed by appropriate methods that, for one, correctly constrain inference to lie within the feasible range of values. In many respects, continuous bounded outcomes can be likened to probabilities or propensities. Yet, what has long been heeded when modeling the probability of binary outcomes with the widespread use of logistic and probit regression, so far appears to have been overlooked with continuous bounded outcomes with consequences at times disastrous. Logistic quantile regression constitutes an effective method to fill this gap. Copyright © 2009 John Wiley & Sons, Ltd.  相似文献   

8.
Prevalence estimation is crucial for controlling the spread of infections and diseases and for planning of health care services. Prevalence estimation is typically conducted via pooled, or group, testing due to limited testing budgets. We study a sequential estimation procedure that uses continuous pool readings and considers the dilution effect of pooling so as to efficiently estimate an unknown prevalence rate. Embedded into the sequential estimation procedure is an optimization model that determines the optimal pooling design (number of pools and pool sizes) under a limited testing budget, considering the trade‐off between testing cost and estimation accuracy. Our numerical study indicates that the proposed sequential estimation procedure outperforms single‐stage procedures, or procedures that use binary test outcomes. Further, the sequential procedure provides robust prevalence estimates in cases where the initial estimate of the unknown prevalence rate is poor, or the assumed distribution of the biomarker load in infected subjects is inaccurate. Thus, when limited and unreliable information is available about the current status of, or biomarker dynamics related to, an infection, the sequential procedure becomes an attractive estimation strategy, due to its ability to mitigate the initial bias.  相似文献   

9.
Assessing goodness-of-fit in logistic regression models can be problematic, in that commonly used deviance or Pearson chi-square statistics do not have approximate chi-square distributions, under the null hypothesis of no lack of fit, when continuous covariates are modelled. We present two easy to implement test statistics similar to the deviance and Pearson chi-square tests that are appropriate when continuous covariates are present. The methodology uses an approach similar to that incorporated by the Hosmer and Lemeshow goodness-of-fit test in that observations are classified into distinct groups according to fitted probabilities, allowing sufficient cell sizes for chi-square testing. The major difference is that the proposed tests perform this grouping within the cross-classification of all categorical covariates in the model and, in some situations, allow for a more powerful assessment of where model predicted and observed counts may differ. A variety of simulations are performed comparing the proposed tests to the Hosmer-Lemeshow test.  相似文献   

10.
We study the application of a widely used ordinal regression model, the cumulative probability model (CPM), for continuous outcomes. Such models are attractive for the analysis of continuous response variables because they are invariant to any monotonic transformation of the outcome and because they directly model the cumulative distribution function from which summaries such as expectations and quantiles can easily be derived. Such models can also readily handle mixed type distributions. We describe the motivation, estimation, inference, model assumptions, and diagnostics. We demonstrate that CPMs applied to continuous outcomes are semiparametric transformation models. Extensive simulations are performed to investigate the finite sample performance of these models. We find that properly specified CPMs generally have good finite sample performance with moderate sample sizes, but that bias may occur when the sample size is small. Cumulative probability models are fairly robust to minor or moderate link function misspecification in our simulations. For certain purposes, the CPMs are more efficient than other models. We illustrate their application, with model diagnostics, in a study of the treatment of HIV. CD4 cell count and viral load 6 months after the initiation of antiretroviral therapy are modeled using CPMs; both variables typically require transformations, and viral load has a large proportion of measurements below a detection limit.  相似文献   

11.
Bivariate copula regression allows for the flexible combination of two arbitrary, continuous marginal distributions with regression effects being placed on potentially all parameters of the resulting bivariate joint response distribution. Motivated by the risk factors for adverse birth outcomes, many of which are dichotomous, we consider mixed binary-continuous responses that extend the bivariate continuous framework to the situation where one response variable is discrete (more precisely, binary) whereas the other response remains continuous. Utilizing the latent continuous representation of binary regression models, we implement a penalized likelihood–based approach for the resulting class of copula regression models and employ it in the context of modeling gestational age and the presence/absence of low birth weight. The analysis demonstrates the advantage of the flexible specification of regression impacts including nonlinear effects of continuous covariates and spatial effects. Our results imply that racial and spatial inequalities in the risk factors for infant mortality are even greater than previously suggested.  相似文献   

12.
Applications frequently involve logistic regression analysis with clustered data where there are few positive outcomes in some of the independent variable categories. For example, an application is given here that analyzes the association of asthma with various demographic variables and risk factors using data from the third National Health and Nutrition Examination Survey, a weighted multi stage cluster sample. Although there are 742 asthma cases in all (out of 18,395 individuals), for one of the categories of one of the independent variables there are only 25 asthma cases (out of 695 individuals). Generalized Wald and score hypothesis tests, which use appropriate cluster-level variance estimators, and a bootstrap hypothesis test have been proposed for testing logistic regression coefficients with cluster samples. When there are few positive outcomes, simulations presented in this paper show that these tests can sometimes have either inflated or very conservative levels. A simulation-based method is proposed for testing logistic regression coefficients with cluster samples when there are few positive outcomes. This testing methodology is shown to compare favorably with the generalized Wald and score tests and the bootstrap hypothesis test in terms of maintaining nominal levels. The proposed method is also useful when testing goodness-of-fit of logistic regression models using deciles-of-risk tables.  相似文献   

13.
Multistage designs allow considerable reductions in the expected sample size of a trial. When stopping for futility or efficacy is allowed at each stage, the expected sample size under different possible true treatment effects (δ) is of interest. The δ-minimax design is the one for which the maximum expected sample size is minimised amongst all designs that meet the types I and II error constraints. Previous work has compared a two-stage δ-minimax design with other optimal two-stage designs. Applying the δ-minimax design to designs with more than two stages was not previously considered because of computational issues. In this paper, we identify the δ-minimax designs with more than two stages through use of a novel application of simulated annealing. We compare them with other optimal multistage designs and the triangular design. We show that, as for two-stage designs, the δ-minimax design has good expected sample size properties across a broad range of treatment effects but generally has a higher maximum sample size. To overcome this drawback, we use the concept of admissible designs to find trials which balance the maximum expected sample size and maximum sample size. We show that such designs have good expected sample size properties and a reasonable maximum sample size and, thus, are very appealing for use in clinical trials.  相似文献   

14.
At the time of data acquisition in a longitudinal study a decision needs to be made whether or not the latest measurement of the primary outcome is a potential outlier. If the data point does not fit with the subject's prior data, the patient can be immediately remeasured before he/she leaves the office. From the third visit onwards, a least squares approach can be used to generate prediction intervals for the value of the response at that visit. We propose a Bayesian method for calculating a prediction interval that can incorporate external information about the process that can be used beginning at the first visit. Both the least squares and Bayesian approaches will be used to prospectively clean longitudinal data. An example using longitudinally measured bone density measurements in the elderly will be discussed. In addition, simulation studies will be described which show that both cleaning methods are better than doing nothing and that the Bayesian approach outperforms the least squares method.  相似文献   

15.
Rothman提出生物学交互作用的评价应该基于相加尺度即是否有相加交互作用,而logistic回归模型的乘积项反映的是相乘交互作用.目前国内外文献讨论logistic回归模型中两因素的相加交互作用以两分类变量为主,本文介绍两连续变量或连续变量与分类变量相加交互作用可信区间估计的Bootstrap方法,文中以香港男性肺癌病例对照研究资料为例,辅以免费软件R的实现程序,为研究人员分析交互作用提供参考.  相似文献   

16.
Rothman提出生物学交互作用的评价应该基于相加尺度即是否有相加交互作用,而logistic回归模型的乘积项反映的是相乘交互作用.目前国内外文献讨论logistic回归模型中两因素的相加交互作用以两分类变量为主,本文介绍两连续变量或连续变量与分类变量相加交互作用可信区间估计的Bootstrap方法,文中以香港男性肺癌病例对照研究资料为例,辅以免费软件R的实现程序,为研究人员分析交互作用提供参考.  相似文献   

17.
Logistic回归模型中连续变量交互作用的分析   总被引:1,自引:0,他引:1       下载免费PDF全文
Rothman提出生物学交互作用的评价应该基于相加尺度即是否有相加交互作用,而logistic回归模型的乘积项反映的是相乘交互作用.目前国内外文献讨论logistic回归模型中两因素的相加交互作用以两分类变量为主,本文介绍两连续变量或连续变量与分类变量相加交互作用可信区间估计的Bootstrap方法,文中以香港男性肺癌病例对照研究资料为例,辅以免费软件R的实现程序,为研究人员分析交互作用提供参考.  相似文献   

18.
Biomedical studies have a common interest in assessing relationships between multiple related health outcomes and high‐dimensional predictors. For example, in reproductive epidemiology, one may collect pregnancy outcomes such as length of gestation and birth weight and predictors such as single nucleotide polymorphisms in multiple candidate genes and environmental exposures. In such settings, there is a need for simple yet flexible methods for selecting true predictors of adverse health responses from a high‐dimensional set of candidate predictors. To address this problem, one may either consider linear regression models for the continuous outcomes or convert these outcomes into binary indicators of adverse responses using predefined cutoffs. The former strategy has the disadvantage of often leading to a poorly fitting model that does not predict risk well, whereas the latter approach can be very sensitive to the cutoff choice. As a simple yet flexible alternative, we propose a method for adverse subpopulation regression, which relies on a two‐component latent class model, with the dominant component corresponding to (presumed) healthy individuals and the risk of falling in the minority component characterized via a logistic regression. The logistic regression model is designed to accommodate high‐dimensional predictors, as occur in studies with a large number of gene by environment interactions, through the use of a flexible nonparametric multiple shrinkage approach. The Gibbs sampler is developed for posterior computation. We evaluate the methods with the use of simulation studies and apply these to a genetic epidemiology study of pregnancy outcomes. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

19.
Covariates may affect continuous responses differently at various points of the response distribution. For example, some exposure might have minimal impact on conditional means, whereas it might lower conditional 10th percentiles sharply. Such differential effects can be important to detect. In studies of the determinants of birth weight, for instance, it is critical to identify exposures like the one above, since low birth weight is a risk factor for later health problems. Effects of covariates on the tails of distributions can be obscured by models (such as linear regression) that estimate conditional means; however, effects on tails can be detected by quantile regression. We present 2 approaches for exploring high-dimensional predictor spaces to identify important predictors for quantile regression. These are based on the lasso and elastic net penalties. We apply the approaches to a prospective cohort study of adverse birth outcomes that includes a wide array of demographic, medical, psychosocial, and environmental variables. Although tobacco exposure is known to be associated with lower birth weights, the analysis suggests an interesting interaction effect not previously reported: tobacco exposure depresses the 20th and 30th percentiles of birth weight more strongly when mothers have high levels of lead in their blood compared with those who have low blood lead levels.  相似文献   

20.
We develop a simulation‐based procedure for determining the required sample size in binomial regression risk assessment studies when response data are subject to misclassification. A Bayesian average power criterion is used to determine a sample size that provides high probability, averaged over the distribution of potential future data sets, of correctly establishing the direction of association between predictor variables and the probability of event occurrence. The method is broadly applicable to any parametric binomial regression model including, but not limited to, the popular logistic, probit, and complementary log–log models. We detail a common medical scenario wherein ascertainment of true disease status is impractical or otherwise impeded, and in its place the outcome of a single binary diagnostic test is used as a surrogate. These methods are then extended to the two diagnostic test setting. We illustrate the method with categorical covariates using one example that involves screening for human papillomavirus. This example coupled with results from simulated data highlights the utility of our Bayesian sample size procedure with error prone measurements. Copyright © 2008 John Wiley & Sons, Ltd.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号