In matched case‐crossover studies, it is generally accepted that the covariates on which a case and associated controls are matched cannot exert a confounding effect on independent predictors included in the conditional logistic regression model. This is because any stratum effect is removed by the conditioning on the fixed number of sets of the case and controls in the stratum. Hence, the conditional logistic regression model is not able to detect any effects associated with the matching covariates by stratum. However, some matching covariates such as time often play an important role as an effect modification leading to incorrect statistical estimation and prediction. Therefore, we propose three approaches to evaluate effect modification by time. The first is a parametric approach, the second is a semiparametric penalized approach, and the third is a semiparametric Bayesian approach. Our parametric approach is a two‐stage method, which uses conditional logistic regression in the first stage and then estimates polynomial regression in the second stage. Our semiparametric penalized and Bayesian approaches are one‐stage approaches developed by using regression splines. Our semiparametric one stage approach allows us to not only detect the parametric relationship between the predictor and binary outcomes, but also evaluate nonparametric relationships between the predictor and time. We demonstrate the advantage of our semiparametric one‐stage approaches using both a simulation study and an epidemiological example of a 1‐4 bi‐directional case‐crossover study of childhood aseptic meningitis with drinking water turbidity. We also provide statistical inference for the semiparametric Bayesian approach using Bayes Factors. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

Motivated by a matched case-control study to investigate potential risk factors for meningococcal disease amongst adolescents, we consider the analysis of matched case-control studies where disease incidence, and possibly other risk factors, vary with time of year. For the cases, the time of infection may be recorded. For controls, however, the recorded time is simply the time of data collection, which is shortly after the time of infection for the matched case, and so depends on the latter. We show that the effect of risk factors and interactions may be adjusted for the time of year effect in a standard conditional logistic regression analysis without introducing any bias. We also show that, if the time delay between data collection for cases and controls is constant, provided this delay is not very short, estimates of the time of year effect are approximately unbiased. In the case that the length of the delay varies over time, the estimate of the time of year effect is biased. We obtain an approximate expression for the degree of bias in this case.  相似文献   

In a wide variety of medical research scenarios one is interested in the question whether regression curves differ for subgroups in the sample. Examples are gender differences in the effect of drug treatment or the study of genotype-environment interactions. To address this question exploratory techniques are often required because detailed knowledge concerning the shape of the regression curves and how that shape differs across subgroups is lacking. In this article we explored the power of two such exploratory techniques: multivariate adaptive regression splines (MARS) and least squares curve fitting using polynomials. For this purpose simulations were performed using linear, logistic, and complex non-linear curves. The power obtained from MARS was on average 1.4 times higher than with polynomials. It was shown that power was higher even if the regression curve was linear, that gains increased with the complexity of the curve, and that for highly non-linear curves model-free methods such as MARS might be the only alternative.  相似文献   

The case-crossover design uses cases only, and compares exposures just prior to the event times to exposures at comparable control, or 'referent' times, in order to assess the effect of short-term exposure on the risk of a rare event. It has commonly been used to study the effect of air pollution on the risk of various adverse health events. Proper selection of referents is crucial, especially with air pollution exposures, which are shared, highly seasonal, and often have a long-term time trend. Hence, careful referent selection is important to control for time-varying confounders, and in order to ensure that the distribution of exposure is constant across referent times, a key assumption of this method. Yet the referent strategy is important for a more basic reason: the conditional logistic regression estimating equations commonly used are biased when referents are not chosen a priori and are functions of the observed event times. We call this bias in the estimating equations overlap bias. In this paper, we propose a new taxonomy of referent selection strategies in order to emphasize their statistical properties. We give a derivation of overlap bias, explore its magnitude, and consider how the bias depends on properties of the exposure series. We conclude that the bias is usually small, though highly unpredictable, and easily avoided.  相似文献   

Relative survival provides a measure of the proportion of patients dying from the disease under study without requiring the knowledge of the cause of death. We propose an overall strategy based on regression models to estimate the relative survival and model the effects of potential prognostic factors. The baseline hazard was modelled until 10 years follow-up using parametric continuous functions. Six models including cubic regression splines were considered and the Akaike Information Criterion was used to select the final model. This approach yielded smooth and reliable estimates of mortality hazard and allowed us to deal with sparse data taking into account all the available information. Splines were also used to model simultaneously non-linear effects of continuous covariates and time-dependent hazard ratios. This led to a graphical representation of the hazard ratio that can be useful for clinical interpretation. Estimates of these models were obtained by likelihood maximization. We showed that these estimates could be also obtained using standard algorithms for Poisson regression.  相似文献   

The activity of neurons in the brain often varies systematically with some quantitative feature of a stimulus or action. A well-known example is the tendency of the firing rates of neurons in the primary motor cortex to vary with the direction of a subject's arm or wrist movement. When this movement is constrained to vary in only two dimensions, the direction of movement may be characterized by an angle, and the neuronal firing rate can be written as a function of this angle. The firing rate function has traditionally been fit with a cosine, but recent evidence suggests that departures from cosine tuning occur frequently. We report here a new non-parametric regression method for fitting periodic functions and demonstrate its application to the fitting of neuronal data. The method is an extension of Bayesian adaptive regression splines (BARS) and applies both to normal and non-normal data, including Poisson data, which commonly arise in neuronal applications. We compare the new method to a periodic version of smoothing splines and some parametric alternatives and find the new method to be especially valuable when the smoothness of the periodic function varies unevenly across its domain.  相似文献   

Luo X  Sorock GS 《Statistics in medicine》2008,27(15):2890-2901
The case-crossover design is useful for studying the effects of transient exposures on short-term risk of diseases or injuries when only data on cases are available. The crossover nature of this design allows each subject to serve as his/her own control. While the original design was proposed for univariate event data, in many applications recurrent events are encountered (e.g. elderly falls, gout attacks, and sexually transmitted infections). In such situations, the within-subject dependence among recurrent events needs to be taken into account in the analysis. We review three existing conditional logistic regression (CLR)-based approaches for recurrent event data under the case-crossover design. A simple approach is to use only the first event for each subject; however, we would expect loss of efficiency in estimation. The other two reviewed approaches rely on independence assumptions for the recurrent events, conditionally on a set of covariates. Furthermore, we propose new methods that adjust the CLR using either a within-subject pairwise resampling technique or a weighted estimating equation. No specific dependency structure among recurrent events is needed therein, and hence, they have more flexibility than the existing methods in the situations with unknown correlation structures. We also propose a weighted Mantel-Haenszel estimator, which is easy to implement for data with a binary exposure. In simulation studies, we show that all discussed methods yield virtually unbiased estimates when the conditional independence assumption holds. These methods are illustrated using data from a study of the effect of medication changes on falls among the elderly.  相似文献   

An important topic when estimating the effect of air pollutants on human health is choosing the best method to control for seasonal patterns and time varying confounders, such as temperature and humidity. Semi‐parametric Poisson time‐series models include smooth functions of calendar time and weather effects to control for potential confounders. Case‐crossover (CC) approaches are considered efficient alternatives that control seasonal confounding by design and allow inclusion of smooth functions of weather confounders through their equivalent Poisson representations. We evaluate both methodological designs with respect to seasonal control and compare spline‐based approaches, using natural splines and penalized splines, and two time‐stratified CC approaches. For the spline‐based methods, we consider fixed degrees of freedom, minimization of the partial autocorrelation function, and general cross‐validation as smoothing criteria. Issues of model misspecification with respect to weather confounding are investigated under simulation scenarios, which allow quantifying omitted, misspecified, and irrelevant‐variable bias. The simulations are based on fully parametric mechanisms designed to replicate two datasets with different mortality and atmospheric patterns. Overall, minimum partial autocorrelation function approaches provide more stable results for high mortality counts and strong seasonal trends, whereas natural splines with fixed degrees of freedom perform better for low mortality counts and weak seasonal trends followed by the time‐season‐stratified CC model, which performs equally well in terms of bias but yields higher standard errors. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

We examine the properties of several tests for goodness-of-fit for multinomial logistic regression. One test is based on a strategy of sorting the observations according to the complement of the estimated probability for the reference outcome category and then grouping the subjects into g equal-sized groups. A g x c contingency table, where c is the number of values of the outcome variable, is constructed. The test statistic, denoted as Cg, is obtained by calculating the Pearson chi2 statistic where the estimated expected frequencies are the sum of the model-based estimated logistic probabilities. Simulations compare the properties of Cg with those of the ungrouped Pearson chi2 test (X2) and its normalized test (z). The null distribution of Cg is well approximated by the chi2 distribution with (g-2) x (c-1) degrees of freedom. The sampling distribution of X2 is compared with a chi2 distribution with n x (c-1) degrees of freedom but shows erratic behavior. With a few exceptions, the sampling distribution of z adheres reasonably well to the standard normal distribution. Power simulations show that Cg has low power for a sample of 100 observations, but satisfactory power for a sample of 400. The tests are illustrated using data from a study of cytological criteria for the diagnosis of breast tumors.  相似文献   

Clinicians and health service researchers are frequently interested in predicting patient-specific probabilities of adverse events (e.g. death, disease recurrence, post-operative complications, hospital readmission). There is an increasing interest in the use of classification and regression trees (CART) for predicting outcomes in clinical studies. We compared the predictive accuracy of logistic regression with that of regression trees for predicting mortality after hospitalization with an acute myocardial infarction (AMI). We also examined the predictive ability of two other types of data-driven models: generalized additive models (GAMs) and multivariate adaptive regression splines (MARS). We used data on 9484 patients admitted to hospital with an AMI in Ontario. We used repeated split-sample validation: the data were randomly divided into derivation and validation samples. Predictive models were estimated using the derivation sample and the predictive accuracy of the resultant model was assessed using the area under the receiver operating characteristic (ROC) curve in the validation sample. This process was repeated 1000 times-the initial data set was randomly divided into derivation and validation samples 1000 times, and the predictive accuracy of each method was assessed each time. The mean ROC curve area for the regression tree models in the 1000 derivation samples was 0.762, while the mean ROC curve area of a simple logistic regression model was 0.845. The mean ROC curve areas for the other methods ranged from a low of 0.831 to a high of 0.851. Our study shows that regression trees do not perform as well as logistic regression for predicting mortality following AMI. However, the logistic regression model had performance comparable to that of more flexible, data-driven models such as GAMs and MARS.  相似文献   

Family-based case-control studies are popularly used to study the effect of genes and gene-environment interactions in the etiology of rare complex diseases. We consider methods for the analysis of such studies under the assumption that genetic susceptibility (G) and environmental exposures (E) are independently distributed of each other within families in the source population. Conditional logistic regression, the traditional method of analysis of the data, fails to exploit the independence assumption and hence can be inefficient. Alternatively, one can estimate the multiplicative interaction between G and E more efficiently using cases only, but the required population-based G-E independence assumption is very stringent. In this article, we propose a novel conditional likelihood framework for exploiting the within-family G-E independence assumption. This approach leads to a simple and yet highly efficient method of estimating interaction and various other risk parameters of scientific interest. Moreover, we show that the same paradigm also leads to a number of alternative and even more efficient methods for analysis of family-based case-control studies when parental genotype information is available on the case-control study participants. Based on these methods, we evaluate different family-based study designs by examining their relative efficiencies to each other and their efficiencies compared to a population-based case-control design of unrelated subjects. These comparisons reveal important design implications. Extensions of the methodologies for dealing with complex family studies are also discussed.  相似文献   

Goodness-of-fit tests for ordinal response regression models   总被引:1,自引:0,他引:1  
It is well documented that the commonly used Pearson chi-square and deviance statistics are not adequate for assessing goodness-of-fit in logistic regression models when continuous covariates are modelled. In recent years, several methods have been proposed which address this shortcoming in the binary logistic regression setting or assess model fit differently. However, these techniques have typically not been extended to the ordinal response setting and few techniques exist to assess model fit in that case. We present the modified Pearson chi-square and deviance tests that are appropriate for assessing goodness-of-fit in ordinal response models when both categorical and continuous covariates are present. The methods have good power to detect omitted interaction terms and reasonable power to detect failure of the proportional odds assumption or modelling the wrong functional form of a continuous covariate. These tests also provide immediate information as to where a model may not fit well. In addition, the methods are simple to understand and implement, and are non-specific. That is, they do not require prespecification of a type of lack-of-fit to detect.  相似文献   

Tumor growth curves provide a simple way to understand how tumors change over time. The traditional approach to fitting such curves to empirical data has been to estimate conditional mean regression functions, which describe the average effect of covariates on growth. However, this method ignores the possibility that tumor growth dynamics are different for different quantiles of the possible distribution of growth patterns. Furthermore, typical individual preclinical cancer drug study designs have very small sample sizes and can have lower power to detect a statistically significant difference in tumor volume between treatment groups. In our work, we begin to address these issues by combining several independent small sample studies of an experimental cancer treatment with differing study designs to construct quantile tumor growth curves. For modeling, we use a Penalized Fixed Effects Quantile Regression with added study effects to control for study differences. We demonstrate this approach using data from a series of small sample studies that investigated the effect of a naturally derived biological peptide, P28, on tumor volumes in mice grafted with human melanoma cells. We find a statistically significant quantile treatment effect on tumor volume trajectories and baseline values. In particular, the experimental treatment and a corresponding conventional chemotherapy had different effects on tumor growth by quantile. The conventional treatment, Dacarbazine (DTIC), tended to inhibit growth for smaller quantiles, while the experimental treatment P28 produced slower rates of growth in the upper quantiles, especially in the 95th quantile. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

Population-based case-control studies measuring associations between haplotypes of single nucleotide polymorphisms (SNPs) are increasingly popular, in part because haplotypes of a few "tagging" SNPs may serve as surrogates for variation in relatively large sections of the genome. Due to current technological limitations, haplotypes in cases and controls must be inferred from unphased genotypic data. Using individual-specific inferred haplotypes as covariates in standard epidemiologic analyses (e.g., conditional logistic regression) is an attractive analysis strategy, as it allows adjustment for nongenetic covariates, provides omnibus and haplotype-specific tests of association, and can estimate haplotype and haplotype x environment interaction effects. In principle, some adjustment for the uncertainty in inferred haplotypes should be made. Via simulation, we compare the performance (bias and mean squared error of haplotype and haplotype x environment interaction effect estimates) of several analytic strategies using inferred haplotypes in the context of matched case-control data. These strategies include using only the most likely haplotype assignment, the expectation substitution approach described by Stram et al. ([2003b] Hum. Hered. 55:179-190) and others, and an improper version of multiple imputation. For relatively uncomplicated haplotype structures and moderate haplotype relative risks (/=5). An application to progesterone-receptor haplotypes and endometrial cancer further illustrates that the performance of all these methods depends on how well the observed haplotypes "tag" the unobserved causal variant.  相似文献   

Yee TW 《Statistics in medicine》2004,23(14):2295-2315
One of the most popular methods for quantile regression is the LMS method of Cole and Green. The method naturally falls within a penalized likelihood framework, and consequently allows for considerable flexible because all three parameters may be modelled by cubic smoothing splines. The model is also very understandable: for a given value of the covariate, the LMS method applies a Box-Cox transformation to the response in order to transform it to standard normality; to obtain the quantiles, an inverse Box-Cox transformation is applied to the quantiles of the standard normal distribution. The purposes of this article are three-fold. Firstly, LMS quantile regression is presented within the framework of the class of vector generalized additive models. This confers a number of advantages such as a unifying theory and estimation process. Secondly, a new LMS method based on the Yeo-Johnson transformation is proposed, which has the advantage that the response is not restricted to be positive. Lastly, this paper describes a software implementation of three LMS quantile regression methods in the S language. This includes the LMS-Yeo-Johnson method, which is estimated efficiently by a new numerical integration scheme. The LMS-Yeo-Johnson method is illustrated by way of a large cross-sectional data set from a New Zealand working population.  相似文献   

简要介绍个体数据Meta分析在效应修饰作用方面的独特优势、整体分析思路及现有分析方法,除了常见的Meta回归和亚组分析外,还介绍了利用部分个体数据合并集合水平数据的分析方法,并总结以上方法的报告现状。以“钠-葡萄糖协同转运蛋白2抑制剂对2型糖尿病患者SBP的影响”作为案例,分别展示上述方法在个体数据Meta分析中的实际应用及结果解读,总结各方法的优势和局限性。  相似文献   

Bernoulli (or binomial) regression using a generalized linear model with a log link function, where the exponentiated regression parameters have interpretation as relative risks, is often more appropriate than logistic regression for prospective studies with common outcomes. In particular, many researchers regard relative risks to be more intuitively interpretable than odds ratios. However, for the log link, when the outcome is very prevalent, the likelihood may not have a unique maximum. To circumvent this problem, a ‘COPY method’ has been proposed, which is equivalent to creating for each subject an additional observation with the same covariates except the response variable has the outcome values interchanged (1's changed to 0's and 0's changed to 1's). The original response is given weight close to 1, while the new observation is given a positive weight close to 0; this approach always leads to convergence of the maximum likelihood algorithm, except for problems with convergence due to multicollinearity among covariates. Even though this method produces a unique maximum, when the outcome is very prevalent, and/or the sample size is relatively small, the COPY method can yield biased estimates. Here, we propose using the jackknife as a bias‐reduction approach for the COPY method. The proposed method is motivated by a study of patients undergoing colorectal cancer surgery. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

Association analysis of genetic polymorphisms has been mostly performed in a case-control setting with unrelated affected subjects compared with unrelated unaffected subjects. In this paper, we present a Bayesian method for analyzing such case-control data when the population is in Hardy-Weinberg equilibrium. Our Bayesian method depends on the informative prior which is the retrospective likelihood based on historical data, raised to a power a. By modeling the retrospective likelihood properly, different prior information about the studied population can be incorporated into the specification of the prior. The scalar a is a precision parameter quantifying the heterogeneity between current and historical data. A guide value for a is discussed in this paper. The informative prior and posterior distributions are proper under very general conditions. Therefore, our method can be applied in most case-control studies. Further, for assessing gene-environment interactions, our approach will naturally lead to a Bayesian model depending only on the case data, when genotype and environmental factors are independent in the population. Thus our approach can be applied to case-only studies. A real example is used to show the applications of our method.  相似文献   

The current study examined the impact of a censored independent variable, after adjusting for a second independent variable, when estimating regression coefficients using ‘naïve’ ordinary least squares (OLS), ‘partial’ OLS and full‐likelihood models. We used Monte Carlo simulations to determine the bias associated with all three regression methods. We demonstrated that substantial bias was introduced in the estimation of the regression coefficient associated with the variable subject to a ceiling effect when naïve OLS regression was used. Furthermore, minor bias was transmitted to the estimation of the regression coefficient associated with the second independent variable. High correlation between the two independent variables improved estimation of the censored variable's coefficient at the expense of estimation of the other coefficient. The use of ‘partial’ OLS and maximum‐likelihood estimation were shown to result in, at most, negligible bias in estimation. Furthermore, we demonstrated that the full‐likelihood method was robust under misspecification of the joint distribution of the independent random variables. Lastly, we provided an empirical example using National Population Health Survey (NPHS) data to demonstrate the practical implications of our main findings and the simple methods available to circumvent the bias identified in the Monte Carlo simulations. Our results suggest that researchers need to be aware of the bias associated with the use of naïve ordinary least‐squares estimation when estimating regression models in which at least one independent variable is subject to a ceiling effect. Copyright © 2004 John Wiley & Sons, Ltd.  相似文献   

