首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
This paper concludes a study of "performance variability" when four methods of multivariable analysis--multiple linear regression, discriminant function analysis, multiple logistic regression, and two arrangements of Cox's proportional hazards regression--were applied to the same stratified random samples of "generating sets" containing seven different statistical distributions of cogent biologic attributes in a composite staging system for a large cohort of patients with lung cancer. Each model developed from the generating sets was also applied for predictions in a previously sequestered "challenge set". Across the different generating sets, the multivariable methods showed good agreement with one another in the stepwise choice of first two powerful predictor variables, but not in the sequence of subsequent choices or in the standardized coefficients assigned to the same collection of "forced" variables. In concordance of predictions for individual patients in the generating sets, the overall proportions of disagreement for pairs of methods ranged from 0 to 28%, and kappa values ranged from 0.49 to 1.00. The accuracy of individual predictions showed relatively similar results when the different methods were applied to the same generating set. Across the generating sets, the different methods showed similar total results but substantial variations in predictions for alive and dead patients. When the models from the generating sets were applied for predictions in the challenge set, the results showed an analogous pattern: similar accuracy within models for overall and live/dead predictions, but substantial variations in live/dead predictions across models derived from different generating sources. The results showed that the multivariable methods often had good agreement with one another in predictions for groups but not for individual persons; and that no single method was superior to the others or to the composite staging system. We conclude that multivariable analytic methods may be most effective and consistent if used to find the few most powerful predictor variables, omitting the many other variables that may be "statistically significant" but less cogent. The powerful predictors may sometimes be best constructed, before the analysis begins, as composite variables containing appropriate unions or ordinal arrangements of elemental candidate variables.  相似文献   

2.
OBJECTIVE: To illustrate scaled rectangle diagrams as a method for displaying clinical and epidemiological attributes (such as symptoms, signs, results of marker tests, disease, or risk factors). These are quantitative Venn diagrams, but using rectangles instead of circles. STUDY DESIGN AND SETTING: The method is illustrated through examples from various data sets with different types of clinical information. RESULTS: Examples drawing on studies of lung disease, rheumatic fever, blood pressure, lipid levels, sudden infant death syndrome, and low birth weight illustrate the different types of relationships between variables that the scaled rectangle approach can reveal (e.g., high- and low-risk groups; dependent, independent, or co-occurring attributes; effects from choice of cutoff; cumulative distributions; and case-control attributes). CONCLUSION: Scaled rectangle diagrams are a novel way to display clinical data. They show clearly the relative frequency of clinical attributes and the extent to which they are shared characteristics. Features are revealed that might otherwise not have been appreciated.  相似文献   

3.
When risk factors for an infectious disease are unknown, a method commonly employed is to investigate parallels with known infections (covariate infections). Data sets of value here are those for specified populations in which the seroprevalence of antibodies for multiple infections has been ascertained. The use of markers of covariate infections in multivariable analyses is problematic when the covariate infection is not itself an independent risk factor for the outcome of interest. In the performance of these analyses, the authors recommend the following strategy: 1) For estimates of the effects of measured risk factors on the outcome, adjustment for the covariate infection should not be done; this will avoid problems of overadjustment. 2) After control for the measured risk factors, an estimate of the "effect" of the covariate infection may be used as an indicator of the presence of unmeasured shared risk factors. 3) When shared, measured risk factors exist, the authors propose the use of methods developed for analysis of repeated measures of categorical variables to assist in inference about shared mechanisms of action of these risk factors. This analytic strategy takes advantage of the method of analogy for building understanding of transmission of new agents through their parallels with better known ones and is useful in the development of hypotheses.  相似文献   

4.
The terms multivariate and multivariable are often used interchangeably in the public health literature. However, these terms actually represent 2 very distinct types of analyses. We define the 2 types of analysis and assess the prevalence of use of the statistical term multivariate in a 1-year span of articles published in the American Journal of Public Health. Our goal is to make a clear distinction and to identify the nuances that make these types of analyses so distinct from one another.Open in a separate windowOpen in a separate windowMost regression models are described in terms of the way the outcome variable is modeled: in linear regression the outcome is continuous, logistic regression has a dichotomous outcome, and survival analysis involves a time to event outcome. Statistically speaking, multivariate analysis refers to statistical models that have 2 or more dependent or outcome variables,1 and multivariable analysis refers to statistical models in which there are multiple independent or response variables.2A multivariable model can be thought of as a model in which multiple variables are found on the right side of the model equation. This type of statistical model can be used to attempt to assess the relationship between a number of variables; one can assess independent relationships while adjusting for potential confounders.A simple linear regression model has a continuous outcome and one predictor, whereas a multiple or multivariable linear regression model has a continuous outcome and multiple predictors (continuous or categorical). A simple linear regression model would have the formBy contrast, a multivariable or multiple linear regression model would take the formwhere y is a continuous dependent variable, x is a single predictor in the simple regression model, and x1, x2, …, xk are the predictors in the multivariable model.As is the case with linear models, logistic and proportional hazards regression models can be simple or multivariable. Each of these model structures has a single outcome variable and 1 or more independent or predictor variables.Multivariate, by contrast, refers to the modeling of data that are often derived from longitudinal studies, wherein an outcome is measured for the same individual at multiple time points (repeated measures), or the modeling of nested/clustered data, wherein there are multiple individuals in each cluster. A multivariate linear regression model would have the formwhere the relationships between multiple dependent variables (i.e., Ys)—measures of multiple outcomes—and a single set of predictor variables (i.e., Xs) are assessed.We took a systematic approach to assessing the prevalence of use of the statistical term multivariate. That is, we used PubMed and the keyword “multivariate” to review articles published in the American Journal of Public Health over a 1-year span (December 2010–November 2011). We identified 30 articles in which the authors indicated the use of a “multivariate” statistical method. Each of the articles was individually reviewed to assess the type of analysis defined as multivariate.In 5 (17%) of the 30 articles, multivariate models (as we have defined them here) were used; 4 (13%) of these models were derived from longitudinal data and 1 from nested data. The remaining 25 (83%) articles involved multivariable analyses; logistic regression (21 of 30, or 70%) was the most prominent type of analysis used, followed by linear regression (3 of 30, or 10%). Interestingly, in 2 of the 30 articles (7%), the terms multivariate and multivariable were used interchangeably. This further elucidates the need to establish consistency in use of the 2 statistical terms.Although some may argue that the interchangeable use of multivariate and multivariable is simply semantics, we believe that differentiating between the 2 terms is important for the field of public health. In general, models used in public health research should be described as simple or multivariable, to indicate the number of predictors, and as linear, logistic, multivariate, or proportional hazards, to indicate the type of outcome (e.g., continuous, dichotomous, repeated measures, time to event).Our review revealed that there is a need for more accurate application and reporting of multivariable methods. This issue is not unique to public health research and has been identified as affecting other areas of research as well (e.g., medicine, psychology, political science).3 However, we hope to see a clearer distinction in the use of the terms multivariate and multivariable to describe statistical analyses in future public health literature. This is an important distinction not only to avoid confusion among readers but to more accurately inform the next generation of public health researchers who are seeking to ground their work in the published literature.  相似文献   

5.
We have evaluated the performance of four stepwise variable selection procedures commonly used in medical and epidemiologic research. The four procedures are discriminant and logistic regression and their rank transformed versions, where the independent variables are replaced by their ranks. We generated, by computer, data for two groups from several distributions with a variety of sample sizes and covariance matrices. The two ranking procedures each increased the chance of correctly selecting those variables related to group membership for data generated from log-normal or contaminated distributions. For normally distributed data the ranking procedure had little effect on variable selection. Rank transformed discriminant analysis and rank transformed logistic regression were equally effective in selecting variables when sample sizes exceeded 100. Rank transformed discriminant analysis was superior for smaller data sets. We discuss the implications of the results of this study for clinical and epidemiologic research.  相似文献   

6.
Search partition analysis (SPAN) is a method to develop classification rules based on Boolean expressions. The performance of SPAN is compared against the trials reported by Lim et al. of 33 other methods of classification, including tree, neural network and regression methods on 16 data sets, most of which were health related. Each data set was augmented with noise variables in further trials. Lim et al. assessed the performance of the methods by estimates of misclassification rate, either cross-validated or test sample based. In this paper, the same data sets are analysed by SPAN and misclassification rates of the SPAN classifiers are estimated. Comparison is made of the performance of SPAN against the other methods that were considered by Lim et al. In terms of average misclassification error rate, taken over all data sets, SPAN was among the best five methods. In terms of average ranking of misclassification, that is, for each data set ranking the misclassification rates from lowest to highest, SPAN was second only to polyclass logistic regression.  相似文献   

7.
The Cox proportional hazards model with time-dependent covariates (TDC) is now a part of the standard statistical analysis toolbox in medical research. As new methods involving more complex modeling of time-dependent variables are developed, simulations could often be used to systematically assess the performance of these models. Yet, generating event times conditional on TDC requires well-designed and efficient algorithms. We compare two classes of such algorithms: permutational algorithms (PAs) and algorithms based on a binomial model. We also propose a modification of the PA to incorporate a rejection sampler. We performed a simulation study to assess the accuracy, stability, and speed of these algorithms in several scenarios. Both classes of algorithms generated data sets that, once analyzed, provided virtually unbiased estimates with comparable variances. In terms of computational efficiency, the PA with the rejection sampler reduced the time necessary to generate data by more than 50 per cent relative to alternative methods. The PAs also allowed more flexibility in the specification of the marginal distributions of event times and required less calibration.  相似文献   

8.

Objective

To summarize the experience, performance and scientific output of long-running telemedicine networks delivering humanitarian services.

Methods

Nine long-running networks – those operating for five years or more– were identified and seven provided detailed information about their activities, including performance and scientific output. Information was extracted from peer-reviewed papers describing the networks’ study design, effectiveness, quality, economics, provision of access to care and sustainability. The strength of the evidence was scored as none, poor, average or good.

Findings

The seven networks had been operating for a median of 11 years (range: 5–15). All networks provided clinical tele-consultations for humanitarian purposes using store-and-forward methods and five were also involved in some form of education. The smallest network had 15 experts and the largest had more than 500. The clinical caseload was 50 to 500 cases a year. A total of 59 papers had been published by the networks, and 44 were listed in Medline. Based on study design, the strength of the evidence was generally poor by conventional standards (e.g. 29 papers described non-controlled clinical series). Over half of the papers provided evidence of sustainability and improved access to care. Uncertain funding was a common risk factor.

Conclusion

Improved collaboration between networks could help attenuate the lack of resources reported by some networks and improve sustainability. Although the evidence base is weak, the networks appear to offer sustainable and clinically useful services. These findings may interest decision-makers in developing countries considering starting, supporting or joining similar telemedicine networks.  相似文献   

9.
10.
This paper evaluates the performance of four variable selection methods suitable for case-control studies. Two of the methods are logistic regression and the rank transformed version of it which uses the ranks of the explanatory variables in place of the original observations. The third method is based on Kendall's τb correlations. I propose a fourth method, a sign score regression model to select variables. To evaluate these four methods, I generate many data sets for a case group and a control group with the use of several different distributions and covariance matrices. I evaluate the methods on their ability to select correctly the variables related to case-control status while not selecting the unrelated variables. Using this criterion, the sign score regression method and the τb method are more effective than the other two methods with uncorrelated or weakly correlated variables. The sign score regression method is more effective than the τb method for all simulations that use normal variables and for some that use log-normal variables. Overall, the sign score regression method is the most effective variable selection method for data sets that have low or moderate correlations between variables.  相似文献   

11.
Inference based on large sample results can be highly inaccurate if applied to logistic regression with small data sets. Furthermore, maximum likelihood estimates for the regression parameters will on occasion not exist, and large sample results will be invalid. Exact conditional logistic regression is an alternative that can be used whether or not maximum likelihood estimates exist, but can be overly conservative. This approach also requires grouping the values of continuous variables corresponding to nuisance parameters, and inference can depend on how this is done. A simple permutation test of the hypothesis that a regression parameter is zero can overcome these limitations. The variable of interest is replaced by the residuals from a linear regression of it on all other independent variables. Logistic regressions are then done for permutations of these residuals, and a p-value is computed by comparing the resulting likelihood ratio statistics to the original observed value. Simulations of binary outcome data with two independent variables that have binary or lognormal distributions yield the following results: (a) in small data sets consisting of 20 observations, type I error is well-controlled by the permutation test, but poorly controlled by the asymptotic likelihood ratio test; (b) in large data sets consisting of 1000 observations, performance of the permutation test appears equivalent to that of the asymptotic test; and (c) in small data sets, the p-value for the permutation test is usually similar to the mid-p-value for exact conditional logistic regression.  相似文献   

12.
The statistical classification problem motivates the search for an analytical procedure capable of classifying observations accurately into one of two or more groups on the basis of information with respect to one or more attributes, and constitutes a fundamental challenge for all scientific disciplines. Although there are many classification methodologies, only optimal discriminant analysis (ODA) explicitly guarantees that the discriminant classifier will maximize classification accuracy in the training sample. This paper presents the first example of multivariable ODA (MultiODA) in medicine, for an application in which we employ three attributes (age and two measures of heart rate variability) to predict susceptibility to sudden cardiac death for a sample of 45 patients. MultiODA outperformed logistic regression analysis on every classification performance index (overall accuracy, sensitivity, specificity, and positive and negative predictive values). In fact, the worst performance result achieved by MultiODA (in total sample or leave-one-out validity analysis) exceeded the best performance achieved by logistic regression analysis. We conclude that ODA offers promise as a methodology capable of improving the classification performance achieved by medical researchers, and that clearly merits investigation in future research.  相似文献   

13.
Reproducible epidemiologic research   总被引:4,自引:0,他引:4  
The replication of important findings by multiple independent investigators is fundamental to the accumulation of scientific evidence. Researchers in the biologic and physical sciences expect results to be replicated by independent data, analytical methods, laboratories, and instruments. Epidemiologic studies are commonly used to quantify small health effects of important, but subtle, risk factors, and replication is of critical importance where results can inform substantial policy decisions. However, because of the time, expense, and opportunism of many current epidemiologic studies, it is often impossible to fully replicate their findings. An attainable minimum standard is "reproducibility," which calls for data sets and software to be made available for verifying published findings and conducting alternative analyses. The authors outline a standard for reproducibility and evaluate the reproducibility of current epidemiologic research. They also propose methods for reproducible research and implement them by use of a case study in air pollution and health.  相似文献   

14.
Bayesian methods are proposed for analysing matched case-control studies in which a binary exposure variable is sometimes measured with error, but whose correct values have been validated for a random sample of the matched case-control sets. Three models are considered. Model 1 makes few assumptions other than randomness and independence between matched sets, while Models 2 and 3 are logistic models, with Model 3 making additional distributional assumptions about the variation between matched sets. With Models 1 and 2 the data are examined in two stages. The first stage analyses data from the validation sample and is easy to perform; the second stage analyses the main body of data and requires MCMC methods. All relevant information is transferred between the stages by using the posterior distributions from the first stage as the prior distributions for the second stage. With Model 3, a hierarchical structure is used to model the relationship between the exposure probabilities of the matched sets, which gives the potential to extract more information from the data. All the methods that are proposed are generalized to studies in which there is more than one control for each case. The Bayesian methods and a maximum likelihood method are applied to a data set for which the exposure of every patient was measured using both an imperfect measure that is subject to misclassification, and a much better measure whose classifications may be treated as correct. To test methods, the latter information was suppressed for all but a random sample of matched sets.  相似文献   

15.
Quality of the analyzed data has a major impact on reliability of the results. Application of statistical methods allows to reduce some stages of chemist's work, for example classification of the numerous data sets. The statistical methods are applied for preliminary evaluation of the data quality. In this case it is necessary to verify that the raw data base does not include large errors or outliers, which could influence the result of experiment. Data analysis, which is performed by chemometric techniques, rely on finding the most correlated attributes. Chemometry is used towards creation of the mathematical model of relation between analyzed property and numerous sets of described variables (parameters which affect measure). Modeling requires calculations towards model identification, checking its relevance, evaluation of the adequacy and determination of model's prognostic ability. The obtained model of relation could be used for the system optimization in the technological process, forecasting the values subsidiary conditioned upon known values described, also for control of the analytical system. The statistical methods are applied in chemical studies for data collection and analysis of chemical compounds for more efficient management of flow of the information. They allow to foreseen physical and biological properties of chemical compounds. The statistical methods are also applied for quality management in chemical analysis of contaminants including pesticide residues in foodstuff.  相似文献   

16.
Turnover among hospital nurses has traditionally been explained in terms of personal attributes of the nurse and extrinsic rewards such as pay and fringe benefits. However, turnover and its determinants may be viewed in the context of a structural model, operating primarily at the level of the hospital patient care unit. Four sets of organizational variables were analyzed to assess their independent and combined effects on nursing turnover rates in hospitals.  相似文献   

17.
Examining the influence of environmental exposures on various health indices is a critical component of the planned National Children's Study (NCS). An ideal strategy for the exposure monitoring component of the NCS is to measure indoor and outdoor concentrations and personal exposures of children to a variety of pollutants, including ambient particulate and gaseous pollutants, biologic agents, persistent organics, nonpersistent organics (e.g., pesticides), inorganic chemicals (e.g., metals), and others. However, because of the large sample size of the study (approximately 100,000 children), it is not feasible to assess every possible exposure of each child. We envision that cost-effective strategies for gathering the necessary exposure-related information with minimum burden to participants, such as broad administration of product-use questionnaires and diaries, would likely be considered in designing the exposure component of the NCS. In general a biologic (e.g., blood, urine, hair, saliva) measure could be the dosimeter of choice for many of the persistent and for some of the nonpersistent organic pollutants. Biologic specimens, such as blood, can also indicate long-term internal dose to various metals, including lead and mercury. Environmental measures, on the other hand, provide pathway/source-specific exposure estimates to many of the environmental agents, including those where biologic measurements are not currently feasible (e.g., for particulate matter and for some gaseous criteria pollutants). However, these may be burdensome and costly to either collect or analyze and may not actually indicate the absorbed dose. Thus, an important technical and logistical challenge for the NCS is to develop an appropriate study design with adequate statistical power that will permit detection of exposure-related health effects, based on an optimum set of exposure measurement methods. We anticipate that low-cost, low-burden methods such as questionnaires and screening type assessments of environmental and biologic samples could be employed, when exposures at different critical life stages of vulnerability can be reliably estimated by these simpler methods. However, when reliability and statistical power considerations dictate the need for collecting more specific exposure information, more extensive environmental, biologic, and personal exposure measurements should be obtained from various "validation" subsets of the NCS population that include children who are in different life stages. This strategy of differential exposure measurement design may allow the exposure-response relationships to be tested on the whole cohort by incorporating the information on the relationship between different types of exposure measures (i.e., ranging from simple to more complex) derived from the detailed validation subsamples.  相似文献   

18.
秩和的分布,区间估计和假设检验的探讨   总被引:4,自引:0,他引:4  
目的 探讨秩和比的分布理论及建立区间估计和假设检验方法。方法 假设各变量相互独立,观察单位某变量取值是随机的,则其秩的分布为均匀分布,秩和的分布是m个独立同分布变量之秩的和。结果 当m和n较小时,如m<3,秩和的分布呈单峰对称分布;m和n不太小时,秩和的分布迅速逼近正态分布。结论 对秩和与秩和比可以应用正态分布理论作区间估计和假设检验。  相似文献   

19.
A comparison of several tests for censored paired data.   总被引:2,自引:0,他引:2  
This paper reviews several tests for censored paired data and uses Monte Carlo methods to evaluate their sizes and powers. Each of the tests maintains its size. The Akritas test and the paired Prentice-Wilcoxon test are somewhat more powerful than the Gehan statistic or the generalized signed rank test for most of the distributions studied. If the survival times follow an exponential distribution, then the Gehan and the generalized signed rank tests are more powerful against a location shift alternative.  相似文献   

20.
This article presents methods to analyze global spatial relationships between two variables in two different sets of fixed points. Analysis of spatial relationships between two phenomena is of great interest in health geography and epidemiology, especially to highlight competing interest between phenomena or evidence of a common environmental factor. Our general approach extends the Moran and Pearson indices to the bivariate case in two different sets of points. The case where the variables are Boolean is treated separately through methods using nearest neighbors distances. All tests use Monte-Carlo simulations to estimate their probability distributions, with options to distinguish spatial and no spatial correlation in the special case of identical sets analysis. Implementation in a Geographic Information System (SavGIS) and real examples are used to illustrate these spatial indices and methods in epidemiology.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号