首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Integration of data of disparate types has become increasingly important to enhancing the power for new discoveries by combining complementary strengths of multiple types of data. One application is to uncover tumor subtypes in human cancer research in which multiple types of genomic data are integrated, including gene expression, DNA copy number, and DNA methylation data. In spite of their successes, existing approaches based on joint latent variable models require stringent distributional assumptions and may suffer from unbalanced scales (or units) of different types of data and non‐scalability of the corresponding algorithms. In this paper, we propose an alternative based on integrative and regularized principal component analysis, which is distribution‐free, computationally efficient, and robust against unbalanced scales. The new method performs dimension reduction simultaneously on multiple types of data, seeking data‐adaptive sparsity and scaling. As a result, in addition to feature selection for each type of data, integrative clustering is achieved. Numerically, the proposed method compares favorably against its competitors in terms of accuracy (in identifying hidden clusters), computational efficiency, and robustness against unbalanced scales. In particular, compared with a popular method, the new method was competitive in identifying tumor subtypes associated with distinct patient survival patterns when applied to a combined analysis of DNA copy number, mRNA expression, and DNA methylation data in a glioblastoma multiforme study. Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

2.
Methods for sample size calculations in ROC studies often assume independent normal distributions for test scores among the diseased and nondiseased populations. We consider sample size requirements under the default two-group normal model when the data distribution for the diseased population is either skewed or multimodal. For these two common scenarios we investigate the potential for robustness of calculated sample sizes under the mis-specified normal model and we compare to sample sizes calculated under a more flexible nonparametric Dirichlet process mixture model. We also highlight the utility of flexible models for ROC data analysis and their importance to study design. When nonstandard distributional shapes are anticipated, our Bayesian nonparametric approach allows investigators to determine a sample size based on the use of more appropriate distributional assumptions than are generally applied. The method also provides researchers a tool to conduct a sensitivity analysis to sample size calculations that are based on a two-group normal model. We extend the proposed approach to comparative studies involving two continuous tests. Our simulation-based procedure is implemented using the WinBUGS and R software packages and example code is made available.  相似文献   

3.
The ‘gold standard’ design for three‐arm trials refers to trials with an active control and a placebo control in addition to the experimental treatment group. This trial design is recommended when being ethically justifiable and it allows the simultaneous comparison of experimental treatment, active control, and placebo. Parametric testing methods have been studied plentifully over the past years. However, these methods often tend to be liberal or conservative when distributional assumptions are not met particularly with small sample sizes. In this article, we introduce a studentized permutation test for testing non‐inferiority and superiority of the experimental treatment compared with the active control in three‐arm trials in the ‘gold standard’ design. The performance of the studentized permutation test for finite sample sizes is assessed in a Monte Carlo simulation study under various parameter constellations. Emphasis is put on whether the studentized permutation test meets the target significance level. For comparison purposes, commonly used Wald‐type tests, which do not make any distributional assumptions, are included in the simulation study. The simulation study shows that the presented studentized permutation test for assessing non‐inferiority in three‐arm trials in the ‘gold standard’ design outperforms its competitors, for instance the test based on a quasi‐Poisson model, for count data. The methods discussed in this paper are implemented in the R package ThreeArmedTrials which is available on the comprehensive R archive network (CRAN). Copyright © 2016 John Wiley & Sons, Ltd.  相似文献   

4.
Zero‐inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero‐inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non‐risk group in the population, the ZIP (ZINB) models a two‐component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at‐risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution‐free alternative and compare its performance with these popular parametric models as well as a moment‐based approach proposed by Yu et al. [Statistics in Medicine 2013; 32 : 2390–2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero‐inflated responses. We illustrate our approach with both simulated and real study data. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

5.
A limiting feature of previous work on growth mixture modeling is the assumption of normally distributed variables within each latent class. With strongly non‐normal outcomes, this means that several latent classes are required to capture the observed variable distributions. Being able to relax the assumption of within‐class normality has the advantage that a non‐normal observed distribution does not necessitate using more than one class to fit the distribution. It is valuable to add parameters representing the skewness and the thickness of the tails. A new growth mixture model of this kind is proposed drawing on recent work in a series of papers using the skew‐t distribution. The new method is illustrated using the longitudinal development of body mass index in two data sets. The first data set is from the National Longitudinal Survey of Youth covering ages 12–23 years. Here, the development is related to an antecedent measuring socioeconomic background. The second data set is from the Framingham Heart Study covering ages 25–65 years. Here, the development is related to the concurrent event of treatment for hypertension using a joint growth mixture‐survival model. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

6.
In many practical applications, count data often exhibit greater or less variability than allowed by the equality of mean and variance, referred to as overdispersion/underdispersion, and there are several reasons that may lead to the overdispersion/underdispersion such as zero inflation and mixture. Moreover, if the count data are distributed as a generalized Poisson or a negative binomial distribution that accommodates extra variation not explained by a simple Poisson or a binomial model, then the dispersion occurs too. In this paper, we deal with a class of two‐component zero‐inflated generalized Poisson mixture regression models to fit such data and propose a local influence measure procedure for model comparison and statistical diagnostics. At first, we formally develop a general model framework that unifies zero inflation, mixture as well as overdispersion/underdispersion simultaneously, and then we mainly investigate two types of perturbation schemes, the global and individual perturbation schemes, for perturbing various model assumptions and detecting influential observations. Also, we obtain the corresponding local influence measures. Our method is novel for count data analysis and can be used to explore these essential issues such as zero inflation, mixture, and dispersion related to zero‐inflated generalized Poisson mixture models. On the basis of the results of model comparison, we could further conduct the sensitivity analysis of perturbation as well as hypothesis test with more accuracy. Finally, we employ here a simulation study and a real example to illustrate the proposed local influence measures. Copyright © 2012 John Wiley & Sons, Ltd.  相似文献   

7.
A number of mixture modeling approaches assume both normality and independent observations. However, these two assumptions are at odds with the reality of many data sets, which are often characterized by an abundance of zero‐valued or highly skewed observations as well as observations from biologically related (i.e., non‐independent) subjects. We present here a finite mixture model with a zero‐inflated Poisson regression component that may be applied to both types of data. This flexible approach allows the use of covariates to model both the Poisson mean and rate of zero inflation and can incorporate random effects to accommodate non‐independent observations. We demonstrate the utility of this approach by applying these models to a candidate endophenotype for schizophrenia, but the same methods are applicable to other types of data characterized by zero inflation and non‐independence. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

8.
The objective of this research was to determine minimal inhibitory concentration (MIC) population distributions for colistin for Salmonella on subtype level. Furthermore, we wanted to determine if differences in MIC for colistin could be explained by mutations in pmrA or pmrB encoding proteins involved in processes that influence the binding of colistin to the cell membrane. During 2008-2011, 6,583 Salmonella enterica subsp. enterica isolates of human origin and 1931 isolates of animal/meat origin were collected. The isolates were serotyped, and susceptibility was tested towards colistin (range 1-16?mg/L). Moreover, 37 isolates were tested for mutations in pmrA and pmrB by polymerase chain reaction (PCR) and DNA sequencing. MIC distribution for colistin at serotype level showed that Salmonella Dublin (n=198) followed by Salmonella Enteritidis (n=1247) were less susceptible than "other" Salmonella serotypes originating from humans (n=5,274) and Salmonella Typhimurium of animal/meat origin (n=1794). MIC was ≤1?mg/L for 98.9% of "other" Salmonella serotypes originating from humans, 99.4% of Salmonella Typhimurium, 61.3% of Salmonella Enteritidis, and 12.1% of Salmonella Dublin isolates. Interestingly, Salmonella Dublin and Salmonella Enteritidis belong to the same O-group (O:1, 9,12), suggesting that surface lipopolysaccharides (LPS) of the cell (O-antigen) play a role in colistin susceptibility. The epidemiological cut-off value of >2?mg/L for colistin suggested by European Committee on Antimicrobial Susceptibility Testing (EUCAST) is placed inside the distribution for both Salmonella Dublin and Salmonella Enteritidis. All tested Salmonella Dublin isolates, regardless of MIC colistin value, had identical pmrA and pmrB sequences. Missense mutations were found only in pmrA in one Salmonella Reading and in pmrB in one Salmonella Concord isolate, both with MIC of ≤1 for colistin. In conclusion, our study indicates that missense mutations are not necessarily involved in increased MICs for colistin. Increased MICs for colistin seemed to be linked to specific serotypes (Salmonella Dublin and Salmonella Enteritidis). We recommend that Salmonella with MIC of >2?mg/L for colistin be evaluated on the serovar level.  相似文献   

9.
Cure rate estimation is an important issue in clinical trials for diseases such as lymphoma and breast cancer and mixture models are the main statistical methods. In the last decade, mixture models under different distributions, such as exponential, Weibull, log-normal and Gompertz, have been discussed and used. However, these models involve stronger distributional assumptions than is desirable and inferences may not be robust to departures from these assumptions. In this paper, a mixture model is proposed using the generalized F distribution family. Although this family is seldom used because of computational difficulties, it has the advantage of being very flexible and including many commonly used distributions as special cases. The generalised F mixture model can relax the usual stronger distributional assumptions and allow the analyst to uncover structure in the data that might otherwise have been missed. This is illustrated by fitting the model to data from large-scale clinical trials with long follow-up of lymphoma patients. Computational problems with the model and model selection methods are discussed. Comparison of maximum likelihood estimates with those obtained from mixture models under other distributions are included. © 1998 John Wiley & Sons, Ltd.  相似文献   

10.
Modern medical treatments have substantially improved survival rates for many chronic diseases and have generated considerable interest in developing cure fraction models for survival data with a non‐ignorable cured proportion. Statistical analysis of such data may be further complicated by competing risks that involve multiple types of endpoints. Regression analysis of competing risks is typically undertaken via a proportional hazards model adapted on cause‐specific hazard or subdistribution hazard. In this article, we propose an alternative approach that treats competing events as distinct outcomes in a mixture. We consider semiparametric accelerated failure time models for the cause‐conditional survival function that are combined through a multinomial logistic model within the cure‐mixture modeling framework. The cure‐mixture approach to competing risks provides a means to determine the overall effect of a treatment and insights into how this treatment modifies the components of the mixture in the presence of a cure fraction. The regression and nonparametric parameters are estimated by a nonparametric kernel‐based maximum likelihood estimation method. Variance estimation is achieved through resampling methods for the kernel‐smoothed likelihood function. Simulation studies show that the procedures work well in practical settings. Application to a sarcoma study demonstrates the use of the proposed method for competing risk data with a cure fraction.  相似文献   

11.
Regression models with mixture (random) components are proposed for the statistical analysis of recurrent events when waiting times between successive events are unknown. These models allow adjustment of parameter estimates for unobserved heterogeneity in the population (due for example to missing covariates) or overdispersion resulting from inexact distributional assumptions. The models are illustrated by a study of recurrence rates of superficial bladder cancer in men.  相似文献   

12.
目的 通过比较新患者与复治患者分离结核分枝杆菌的莫西沙星最低抑菌浓度(MIC)的差异,确定莫西沙星的临床耐药界限.方法 根据患者治疗史,选取109例肺结核病新患者分离出的菌株以及使用过莫西沙星治疗复治患者31例分离出的菌株,共140株,以罗氏含药培养基测定MIC,并分析MIC分布状况,比较MIC累计百分比,通过新患者菌株和复治患者菌株差值最大时的MIC作为耐药界限.结果 来自既往未用过莫西沙星治疗新患者的临床分离株中,94.50%的菌株MIC<0.4 μg/ml,既往曾接受莫西沙星治疗的临床分离株中,MIC>1.0 μg/ml占93.55%.结论 根据MIC分布,两类患者分离的结核分枝杆菌对莫西沙星MIC的差异,提示使用罗氏培养基的莫西沙星药敏试验耐药界限采用1 μg/ml为宜,复治患者中仍有6.45%对莫西沙星敏感.  相似文献   

13.
In health services research, it is common to encounter semicontinuous data characterized by a point mass at zero followed by a right‐skewed continuous distribution with positive support. Examples include health expenditures, in which the zeros represent a subpopulation of patients who do not use health services, while the continuous distribution describes the level of expenditures among health services users. Semicontinuous data are typically analyzed using two‐part mixture models that separately model the probability of health services use and the distribution of positive expenditures among users. However, because the second part conditions on a non‐zero response, conventional two‐part models do not provide a marginal interpretation of covariate effects on the overall population of health service users and non‐users, even though this is often of greatest interest to investigators. Here, we propose a marginalized two‐part model that yields more interpretable effect estimates in two‐part models by parameterizing the model in terms of the marginal mean. This model maintains many of the important features of conventional two‐part models, such as capturing zero‐inflation and skewness, but allows investigators to examine covariate effects on the overall marginal mean, a target of primary interest in many applications. Using a simulation study, we examine properties of the maximum likelihood estimates from this model. We illustrate the approach by evaluating the effect of a behavioral weight loss intervention on health‐care expenditures in the Veterans Affairs health‐care system. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

14.
In tailored drug development, the patient population is thought of as a mixture of two or more subgroups that may derive differential treatment efficacy. To find the right patient population for the drug to target, it is necessary to infer treatment efficacy in subgroups and combinations of subgroups. A fundamental consideration in this inference process is that the logical relationships between treatment efficacy in subgroups and their combinations should be respected (for otherwise the assessment of treatment efficacy may become paradoxical). We show that some commonly used efficacy measures are not suitable for a mixture population. We also show that the current practice of over‐simply extending the least squares means concept when estimating the efficacy in a mixture population is inappropriate. Proposing a new principle called subgroup mixable estimation, we establish the logical relationships among parameters that represent efficacy and develop a simultaneous inference procedure to confidently infer efficacy in subgroups and their combinations. Using oncology studies with time‐to‐event outcomes as an example, we show that the hazard ratio is not suitable for measuring treatment efficacy in a mixture population and provide appropriate efficacy measures with a rigorous inference procedure. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

15.
2008年Mohnarin血流感染病原菌构成及耐药性   总被引:43,自引:39,他引:4  
目的了解我国血流感染患者细菌的分布及耐药状况。方法采用纸片法、MIC法或E-test法测定细菌药物敏感性,使用WHONET5.4软件进行分析,对卫生部全国细菌耐药性监测网(Mohnarin)所属89所三级甲等医院2008年1月1日-12月31日分离的血及骨髓培养菌株进行分析。结果共分离病原菌10519株,包括革兰阳性菌5554株(52.8%)、革兰阴性菌4929株(46.9%)和其他36株(0.3%),分离最多的为凝固酶阴性葡萄球菌,共3215株(30.6%),其次为大肠埃希菌1849株(17.6%)、金黄色葡萄球菌958株(9.0%),克雷伯菌属931株(8.9%)和肠球菌属729株(6.9%);耐甲氧西林金黄色葡萄球菌和凝固酶阴性葡萄球菌的检出率分别为66.2%和83.5%,未发现耐万古霉素金黄色葡萄球菌和凝固酶阴性葡萄球菌;粪肠球菌和屎肠球菌中分别有3.5%和5.0%对万古霉素耐药,2.6%和5.8%对替考拉宁耐药;大肠埃希菌和肺炎克雷伯菌ESBLs的阳性率分别为38.6%和27.5%,对头孢他啶耐药率上升明显;成人葡萄球菌属对氨基糖苷类和喹诺酮类药物的耐药率明显高于儿童,但儿童大肠埃希菌对氨基糖苷类和喹诺酮类药物的耐药率呈上升趋势。结论我国血流及骨髓感染细菌以葡萄球菌属、大肠埃希菌和克雷伯菌属最多见;儿童血流及骨髓感染革兰阳性细菌所占比例高于成人;血骨髓培养MRSA、MRCNS发生率和大肠埃希菌对头孢菌素的耐药率明显;儿童分离细菌对喹诺酮类药物的耐药率逐年增高。  相似文献   

16.
A number of new study designs have appeared in which the exposure distribution of a case series is compared to an exposure distribution representing a complete theoretical population or distribution. These designs include the case‐genotype study, the case‐cross‐over study, and the case‐specular study. This paper describes a unified likelihood‐based approach to the analysis of such studies, and discusses extensions of these methods when a control group is available. The approach clarifies certain assumptions implicit in the methods, and helps contrast these assumptions to those underlying ordinary case‐control studies. There are several reasons to expect discrepancies between ordinary case‐control estimates and case‐distribution estimates; for example, case‐distribution estimates can be more sensitive to exposure misclassification. Some discrepancies are illustrated in an application to case‐specular data on wire codes and childhood cancer. Copyright © 1999 John Wiley & Sons, Ltd.  相似文献   

17.
Powerful array‐based single‐nucleotide polymorphism‐typing platforms have recently heralded a new era in which genome‐wide studies are conducted with increasing frequency. A genetic polymorphism associated with population pharmacokinetics (PK) is typically analyzed using nonlinear mixed‐effect models (NLMM). Applying NLMM to large‐scale data, such as those generated by genome‐wide studies, raises several issues related to the assumption of random effects as follows: (i) computation time: it takes a long time to compute the marginal likelihood; (ii) convergence of iterative calculation: an adaptive Gauss–Hermite quadrature is generally used to estimate NLMM; however, iterative calculations may not converge in complex models; and (iii) random‐effects misspecification leads to slightly inflated type‐I error rates. As an alternative effective approach to resolving these issues, in this article, we propose a generalized estimating equation (GEE) approach for analyzing population PK data. In general, GEE analysis does not account for interindividual variability in PK parameters; therefore, the usual GEE estimators cannot be interpreted straightforwardly, and their validities have not been justified. Here, we propose valid inference methods for using GEE even under conditions of interindividual variability and provide theoretical justifications of the proposed GEE estimators for population PK data. In numerical evaluations by simulations, the proposed GEE approach exhibited high computational speed and stability relative to the NLMM approach. Furthermore, the NLMM analysis was sensitive to the misspecification of the random‐effects distribution, and the proposed GEE inference is valid for any distributional form. We provided an illustration by using data from a genome‐wide pharmacogenomic study of an anticancer drug. Copyright © 2013 John Wiley & Sons, Ltd.  相似文献   

18.
Mohnarin 2008年度ICU细菌耐药性监测   总被引:48,自引:34,他引:14  
目的了解我国来源于重症监护病房(ICU)患者病原菌的分布及对抗菌药物的耐药性。方法采用纸片法、MIC法或E-test法测定细菌药物敏感性,使用WHONET5.4软件进行分析,对卫生部全国细菌耐药性监测网(Mohnarin)所属89所三级甲等医院2008年1月1日-12月31日上报数据中明确标明ICU来源的菌株进行分析。结果共分离病原菌6115株,其中革兰阴性菌4378株(71.6%),革兰阳性菌1737株(28.4%),前5位分别为鲍氏不动杆菌933株(15.3%)、铜绿假单胞菌850株(13.9%)、金黄色葡萄球菌765株(12.5%)、肺炎克雷伯属669株(10.9%)和大肠埃希菌559株(9.1%);耐头孢西丁金黄色葡萄球菌、表皮葡萄球菌和溶血葡萄球菌的检出率分别为84.8%、93.0%和95.0%,未发现对万古霉素和利奈唑胺耐药的葡萄球菌属;粪肠球菌和屎肠球菌中分别有1.5%和6.7%对万古霉素耐药,3.6%和1.2%对替考拉宁耐药;大肠埃希菌对三代头孢菌素(包括头孢他啶)、头孢吡肟、氨曲南的耐药率70.0%,对喹诺酮类的耐药率70.0%;大肠埃希菌和肺炎克雷伯菌产ESBLs比率分别为74.4%和73.1%,高于同期整体监测水平;铜绿假单胞菌、鲍氏不动杆菌对碳青酶烯类耐药率进一步上升,鲍氏不动杆菌耐药率为60.0%,超过铜绿假单胞菌耐药率。结论我国ICU来源细菌仍然以非发酵菌、葡萄球菌属、肺炎克雷伯菌、大肠埃希菌为主;细菌的耐药进一步恶化,必须加以注意。  相似文献   

19.
Many published scale validation studies determine inter‐rater reliability using the intra‐class correlation coefficient (ICC). However, the use of this statistic must consider its advantages, limitations, and applicability. This paper evaluates how interaction of subject distribution, sample size, and levels of rater disagreement affects ICC and provides an approach for obtaining relevant ICC estimates under suboptimal conditions. Simulation results suggest that for a fixed number of subjects, ICC from the convex distribution is smaller than ICC for the uniform distribution, which in turn is smaller than ICC for the concave distribution. The variance component estimates also show that the dissimilarity of ICC among distributions is attributed to the study design (ie, distribution of subjects) component of subject variability and not the scale quality component of rater error variability. The dependency of ICC on the distribution of subjects makes it difficult to compare results across reliability studies. Hence, it is proposed that reliability studies should be designed using a uniform distribution of subjects because of the standardization it provides for representing objective disagreement. In the absence of uniform distribution, a sampling method is proposed to reduce the non‐uniformity. In addition, as expected, high levels of disagreement result in low ICC, and when the type of distribution is fixed, any increase in the number of subjects beyond a moderately large specification such as n = 80 does not have a major impact on ICC.  相似文献   

20.
Standard linear regression is commonly used for genetic association studies of quantitative traits. This approach may not be appropriate if the trait, on its original or transformed scales, does not follow a normal distribution. A rank‐based nonparametric approach that does not rely on any distributional assumptions can be an attractive alternative. Although several nonparametric tests exist in the literature, their performance in the genetic association setting is not well studied. We evaluate various nonparametric tests for the analysis of quantitative traits and propose a new class of nonparametric tests that have robust performance for traits with various distributions and under different genetic models. We demonstrate the advantage of our proposed methods through simulation study and real data applications.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号