首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 832 毫秒
1.
居民健康调查资料中的缺失数据的多重估算   总被引:3,自引:1,他引:2  
目的 解决居民健康调查数据中存在的数据缺失问题。充分利用所采集到的数据。得出更有效的统计推断,方法 运用建立在马尔科夫链蒙特卡罗方法基础上的多重估算技术,对缺失数据进行替换,产生多个完整的数据集,进行联合统计推断。结果 弥补了由于缺失数据所造成的信息损失,改善了统计推断的质量。结论 多重估算技术是解决社会调查资料中数据缺失问题的有效工具。  相似文献   

2.
有缺失数据的2×2交叉设计的多重填补与分析   总被引:1,自引:0,他引:1  
目的探讨利用Rubin提出的多重填补的方法处理有缺失数据的2×2交叉设计的资料,以避免医学科研中常常发生观测数据的缺失而造成统计分析的困难。方法用MI对缺失数据进行填补,用标准的统计程序对填补后的数据集分析,最后用多重填补分析综合各个数据集的统计分析结果。结果多重填补的方法可用于交叉设计中缺失数据的填补并得出正确的统计推断。结论多重填补与多重填补分析为处理存在缺失数据的资料提供了有效的策略。  相似文献   

3.
多重填补法与Ad Hoc法对模拟纵向数据集缺失值处理的比较   总被引:3,自引:0,他引:3  
目的:采用多重填补法(multiple imputation,MI)和Ad hoc法分别对模拟的纵向数据集中的缺失值进行处理,较两种方法的优劣并探讨其适用性。方法:运用SAS9.0,采用数据模拟技术,分别模拟纵向完整数据集和具有各种缺失的随机缺失数据集,分别用MI和Ad hoc法对各缺失数据集进行处理,对结果进行比较和分析。结果:数据缺失率≤%时,Ad hoc方法有一定优势;数据缺失率在20%-40%时,经MI处理后的分析结果更接近“真实”;数据缺失率≥50%时,两种方法均无效。结论:对不同缺失率的数据集,MI和Ad hoc法对缺失值的处理各有优劣。  相似文献   

4.
多重填补的方法及其统计推断原理   总被引:6,自引:0,他引:6  
目的 描述数据缺失的特征和数据缺失模式,对Rubin最早提出的多重填补(multiple imputation,MI)的基本概念、填补和分析缺失数据的方法、综合统计推断进行了探讨,分析了MI的特点、局限性以及应用MI方法处理不完整数据集时需要注意的地方。方法 通过计算机模拟,用MI方法将每一个缺失值用一系列可能的值填补,然后使用常规的、针对完全数据集的统计方法对多重填补后得到的若干数据集进行分析,并把所得的结果进行综合。结果 多重填补值显示出了缺失数据的不确定性,使得已有数据得到了充分利用,从而对总体参数做出了更为准确的估计。结论 MI方法为处理存在缺失值的数据集提供了有用的策略,并且适用于多种数据缺失的场合。  相似文献   

5.
目的探讨不完全病例对照研究中对照组基因信息部分缺失时基因一环境交互作用的估计。方法在Stata9.0软件上采用MonteCarlo方法模拟不同基因信息缺失比例数据,对缺失数据采用hotdeck多重填补程序后分析和删除缺失值分析结果进行比较。结果缺失数据〈50%时,hotdeck多重填补后分析和删除缺失值分析对环境主效应、基因主效应以及基因-环境交互作用的估计系数接近完全数据的系数,随缺失比例的增加,两种方法的估计方差均增加,但hotdeck多重填补估计方差小于删除缺失值分析。结论不完全病例对照研究中,对照组基因信息缺失比例〈50%时,可以用hotdeck填补方法充分利用已有的信息估计基因-环境的交互作用,提高估计精度。  相似文献   

6.
目的探讨基于Bootstrap方法的EM估计在缺失数据多重填补中的应用及R中进行缺失数据分析。方法应用R中的epicalc统计包和Amelia II统计包分析男性健康调查缺失数据,通过Bootstrap法进行放回抽样,用EM算法对产生的m个抽样个体进行迭代分析,最后运用R中的"plot"和"disperse"函数对观察值和缺失值的分布,迭代初值的收敛性进行探讨。结果当迭代次数m=5时,男性健康数据的多重填补观察值与缺失值的分布最接近,且所有迭代初值均收敛。结论基于Bootstrap抽样的EM算法得到的多重填补数据集对实际观察数据集具有较好的代表性,可以用于对缺失数据集的预测。  相似文献   

7.
目的 简要介绍R 环境下MICE填补方法(Multivariate imputation by chained equations)的填补估算应用并评价其填补效果.方法以实际数据阐述填补估算流程,比较MICE与常见的缺失数据处理方法(删除法、均(众)数法、回归法)填补估算效果的差异.结果当数据缺失率为10%时,MICE与常见的缺失数据处理方法估算结果无明显差异,各填补方法的3种变量的回归系数估计的相对误差在10%左右.随着缺失率的增加(20%,40%),各方法回归系数估计的相对误差都增加,但MICE 3种变量的回归系数的相对误差稳定在10%~20%左右,MICE表现优于其他方法而且结果稳定,回归法次之,删除法和均(众)数法较差.当缺失率达50%时,3种类型的变量估算的误差已经较大,所有方法填补估算效果欠佳.结论 MICE较其他多重填补软件操作简便,与常见的缺失数据处理方法相比,可充分地利用缺失记录的信息,能较准确地反应调查的真实情况,值得在实际工作中推广应用.  相似文献   

8.
卫生统计     
完全随机设计两样本率比较的非条件确切检验方法;差异表达基因鉴别的SAM和RVM的比较;缺失数据的多重估算;对应分析在探索交叉数据表行、列变量关系时的应用;有缺失数据的2×2交叉设计的多重填补与分析;比例优势模型实现ROC分析的方法及其应用前景分析;关于数理统计中系统聚类法的讨论.[编按]  相似文献   

9.
目的研究基于bootstrap抽样的期望最大化算法(EMB)的多重填补方法在横断面健康体检定量变量缺失数据的填补效果,为健康体检数据选择恰当的多重填补方法提供相关依据。方法基于人群横断面健康体检实测数据,采用EMB法多重填补法,应用R 3.5.0统计软件中的Amelia II程序包对2013年1—12月在陕西省西安市西京医院健康体检中心进行常规体检的1 634名员工的健康体检数据进行多重填补分析。结果对于横断面定量健康体检资料,在单变量缺失率分别为10%、20%和70%3种随机缺失情况下,EMB多重填补法相对于列表删除法其估计误差均降低;基于相同数据,EMB多重填补次数不同,资料的填补效果不同,本研究资料较为合适的填补次数为m=10次;填补前后概率密度曲线分布图显示,填补次数m=10时多重填补值与实际观察值的概率密度曲线图吻合程度较好;变量过拟合诊断图进一步显示,填补次数m=10时各变量大多数观测值的90%CI包含了其最佳拟合线,且其可信区间较窄;基于列表删除法和EMB多重填补法处理后的2个不同分析数据集分别构建的多因素回归模型中包含的变量不同。结论对于不同缺失率随机缺失的定量变量,EMB多重填补法的填补效果均优于列表删除法;不同缺失资料的最优填补次数不同。  相似文献   

10.
目的 系统分析当前健康体检数据的数据特征,利用Excel和SAS软件宏过程实现数据预处理。方法 利用某地市级三甲医院2017年10月至2020年12月健康体检数据平台中的健康体检数据,通过数据梳理总结当前体检数据的特征,制定相应的预处理规则,并基于Excel和SAS软件提出具体数据预处理方案、操作流程及宏代码。结果 通过Excel和SAS软件进行了健康体检数据的批量列名转换,使其符合SAS软件变量名命名规则;实现了多个不同结构的数据集合并而不出现截断值,保证了数据库的完整性;通过删除缺失变量和观察、合并重复变量和识别重复观察等过程,最终结合人工识别完成了体检数据预处理,形成了可供研究者进一步使用的健康体检数据库。在处理过程中编写了SAS宏过程,实现了数据预处理代码模块化。结论 通过Excel和SAS软件可以实现健康体检数据高效预处理、提高了数据质量、增加了数据可利用性,为数据库的利用和分析奠定基础,为健康体检数据的多中心研究应用的实现提供可能,具有一定的应用推广价值。  相似文献   

11.
This study compared 3 d of carbohydrate loading (CHOL; 8.4 g x kg(-1) x d(-1) carbohydrate) in female eumenorrheic athletes with 3 d of an isoenergetic normal diet (NORM; 5.2 g x kg(-1) x d(-1) carbohydrate) and examined the effect of menstrual-cycle phase on performance, muscle-glycogen concentration [glyc], and substrate utilization. Nine moderately trained eumenorrheic women cycled in an intermittent protocol varying in intensity from 45% to 75% VO2max for 75 min, followed by a 16-km time trial at the midfollicular (MF) and midluteal (ML) phases of the menstrual cycle on NORM and CHOL. Time-trial performance was not affected by diet (CHOL 26.10 +/- 1.04 min, NORM 26.16 +/- 1.35 min; P = 0.494) or menstrual-cycle phase (MF 26.05 +/- 1.10 min, ML 26.23 +/- 1.33 min; P = 0.370). Resting [glyc] was lowest in the MF phase after NORM (575 +/- 145 mmol x kg(-1) x dw(-1)), compared with the MF phase after CHOL (728 mmol x kg(-1) x dw(-1)) and the ML phase after CHOL and NORM (756 and 771 mmol x kg(-1) x dw(-1), respectively). No effect of phase on substrate utilization during exercise was observed. These data support previous observations of greater resting [glyc] in the ML than the MF phase of the menstrual cycle and suggest that lower glycogen storage in the MF phase can be overcome by carbohydrate loading.  相似文献   

12.
Fisher RP  Easty DB 《Health physics》2003,84(4):518-525
Field surveys were carried out at sixteen pulp and paper mills in the United States--seven kraft process, two sulfite process, and seven recycling process mills--for the presence of naturally occurring radioactive material (NORM) in precipitates and scales. NORM was detected at three of the kraft mills, one sulfite mill, and none of the recycling mills. At one of the kraft mills, the NORM was associated with a commercial aluminum sulfate ("alum") slurry used on the paper machines and in intake water treatment. The maximum activity level of this alum scale was 252,000 Bq kg(-1) (6,800 pCi g(-1)) 228Ra. At two kraft mills, NORM was associated with precipitates in the bleach plant. The measured NORM activity in samples of these scales was approximately 44,400 Bq kg(-1) (1,200 pCi g(-1)) 226Ra. Where NORM was detected at a sulfite mill, the NORM deposits were found adhering tightly to the surfaces of brownstock washers. Although samples were not removed for radionuclide analysis, survey readings at the drum surface were 26 nC kg(-1) h(-1) (100 microR h(-1)) with a scintillation counter and 2,200 cpm with a Geiger-Muller counter. At all mills, exposure rate measurements and risk assessment calculations indicated that it would be highly unlikely for any worker's annual exposure to exceed 1 mSv (100 mrem) (the Nuclear Regulatory Commission limit for untrained workers) due to exposure to these materials.  相似文献   

13.
Elevated concentrations of naturally occurring radioactive material (NORM), including 238U, 232Th, and their progeny found in underground geologic deposits, are often encountered during crude oil recovery. Radium, the predominant radionuclide brought to the surface with the crude oil and produced water, co-precipitates with barium in the form of complex compounds of sulfates, carbonates, and silicates found in sludge and scale. These NORM deposits are highly stable and very insoluble under ambient conditions at the earth's surface. However, the co-precipitated radium matrix is not thermodynamically stable at reducing conditions which may enable a fraction of the radium to eventually be released to the environment. Although the fate of radium in uranium mill tailings has been studied extensively, the leachability of radium from crude oil NORM deposits exposed to acid-rain and other aging processes is generally unknown. The leachability of radium from NORM contaminated soil collected at a contaminated oil field in eastern Kentucky was determined using extraction fluids having wide range of pH reflecting different extreme environmental conditions. The average 226Ra concentration in the samples of soil subjected to leachability testing was 32.56 Bq g(-1) +/- 0.34 Bq g(-1). The average leaching potential of 226Ra observed in these NORM contaminated soil samples was 1.3% +/- 0.46% and was independent of the extraction fluid. Risk assessment calculations using the family farm scenario show that the annual dose to a person living and working on this NORM contaminated soil is mainly due to external gamma exposure and radon inhalation. However, waterborne pathways make a non-negligible contribution to the dose for the actual resident families living on farmland with the type of residual NORM contamination due to crude oil recovery operations.  相似文献   

14.
Generalized estimating equations have been well established to draw inference for the marginal mean from follow-up data. Many studies suffer from missing data that may result in biased parameter estimates if the data are not missing completely at random. Robins and co-workers proposed using weighted estimating equations (WEE) in estimating the mean structure if drop-out occurs missing at random. We illustrate the differences between the WEE and the commonly applied available case analysis in a simulation study. We apply the WEE and reanalyse data of a longitudinal study of pregnancy and human papilloma virus (HPV) infection. We estimate the response probabilities and demonstrate that the data are not missing completely at random. Upon use of the WEE, we are able to show that pregnant women have an increased odds for an HPV infection compared with non-pregnant women after delivery (p=0.027). We conclude that the WEE are useful for dealing with monotone missing data due to drop-outs in follow-up data.  相似文献   

15.
The true missing data mechanism is never known in practice. We present a method for generating multiple imputations for binary variables, which formally incorporates missing data mechanism uncertainty. Imputations are generated from a distribution of imputation models rather than a single model, with the distribution reflecting subjective notions of missing data mechanism uncertainty. Parameter estimates and standard errors are obtained using rules for nested multiple imputation. Using simulation, we investigate the impact of missing data mechanism uncertainty on post‐imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal smoking cessation trial where nonignorably missing data were a concern. Our method provides a simple approach for formalizing subjective notions regarding nonresponse and can be implemented using existing imputation software. Copyright © 2014 John Wiley & Sons, Ltd.  相似文献   

16.
We extend the pattern‐mixture approach to handle missing continuous outcome data in longitudinal cluster randomized trials, which randomize groups of individuals to treatment arms, rather than the individuals themselves. Individuals who drop out at the same time point are grouped into the same dropout pattern. We approach extrapolation of the pattern‐mixture model by applying multilevel multiple imputation, which imputes missing values while appropriately accounting for the hierarchical data structure found in cluster randomized trials. To assess parameters of interest under various missing data assumptions, imputed values are multiplied by a sensitivity parameter, k, which increases or decreases imputed values. Using simulated data, we show that estimates of parameters of interest can vary widely under differing missing data assumptions. We conduct a sensitivity analysis using real data from a cluster randomized trial by increasing k until the treatment effect inference changes. By performing a sensitivity analysis for missing data, researchers can assess whether certain missing data assumptions are reasonable for their cluster randomized trial.  相似文献   

17.
BACKGROUND: In longitudinal studies, it is extremely rare that all the planned measurements are actually performed. Missing data are often consecutive to drop-outs, but may also be intermittent. In both cases, the analysis of incomplete data necessarily requires assumptions that are generally unverifiable, and the need for sensitivity analyses has been advocated over the past few years. In this article, the attention will be given to longitudinal binary data. METHODS: A method is proposed, which is based on a log-linear model. A sensitivity parameter is introduced that represents the relationship between the response mechanism and the missing data mechanism. It is recommended not to estimate this parameter, but to consider a range of plausible values, and to estimate the parameters of interest conditionally on these plausible values. This allows to assess the sensitivity of the conclusion of a study to various assumptions regarding the missing data mechanism. RESULTS: This method was applied to a randomized clinical trial comparing the efficacy of two treatment regimens in patients with persistent asthma. The sensitivity analysis showed that the conclusion of this study was robust to missing data.  相似文献   

18.
We studied bias due to missing exposure data in the proportional hazards regression model when using complete-case analysis (CCA). Eleven missing data scenarios were considered: one with missing completely at random (MCAR), four missing at random (MAR), and six non-ignorable missingness scenarios, with a variety of hazard ratios, censoring fractions, missingness fractions and sample sizes. When missingness was MCAR or dependent only on the exposure, there was negligible bias (2-3 per cent) that was similar to the difference between the estimate in the full data set with no missing data and the true parameter. In contrast, substantial bias occurred when missingness was dependent on outcome or both outcome and exposure. For models with hazard ratio of 3.5, a sample size of 400, 20 per cent censoring and 40 per cent missing data, the relative bias for the hazard ratio ranged between 7 per cent and 64 per cent. We observed important differences in the direction and magnitude of biases under the various missing data mechanisms. For example, in scenarios where missingness was associated with longer or shorter follow-up, the biases were notably different, although both mechanisms are MAR. The hazard ratio was underestimated (with larger bias) when missingness was associated with longer follow-up and overestimated (with smaller bias) when associated with shorter follow-up. If it is known that missingness is associated with a less frequently observed outcome or with both the outcome and exposure, CCA may result in an invalid inference and other methods for handling missing data should be considered.  相似文献   

19.
Missing responses for health-related quality of life (HRQL) outcomes are common in clinical trials and may introduce bias as such data are often not missing at random. To evaluate the missingness (dropout) effect when comparing two treatment groups in a longitudinal randomized trial, we analyzed the Functional Assessment of Cancer Therapy Trial Outcome Index (TOI) change over 12 months for newly diagnosed patients with chronic myeloid leukemia. HRQL assessment was expected at baseline and months 1, 2, 3, 4, 5, 6, 9 and 12. We defined completers as those with baseline and month 12 TOI, and dropouts as all others as long as they had a baseline score. We defined censoring time as the time interval between baseline and the scheduled month 12 visit dates and approximate time-to-dropout as the time interval from baseline to the midpoint between date of the last reported TOI and the scheduled next visit date. A mixed-effects model was first built to assess treatment effect; a pattern-mixture model and a joint model were then built to account for non-ignorable dropout. Intermittent missing data were assumed to be missing at random. A square root transformation of TOI scores was taken to fulfill the normality and homogeneity assumption at each time point in all the models. The mixed-effects model revealed significant (P < 0.001) between-group differences at each visit except for baseline. The joint model generated similar parameter estimates as the separate longitudinal and survival sub-models with a significant association parameter (P = 0.039) indicating negative association between slope of TOI and hazard of dropout and thus non-ignorable dropout. The pattern-mixture model parameter estimates were fairly similar to those generated from the joint model. When non-ignorable missing data exist in longitudinal studies, a joint model is useful to quantify the relationship between dropout and outcome. In addition, it is important to examine underlying assumptions and utilize multiple missing data models including the pattern mixture model to assess sensitivity of model based inference to assumptions about missing mechanisms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号