首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
The need for resource-intensive laboratory assays to assess exposures in many epidemiologic studies provides ample motivation to consider study designs that incorporate pooled samples. In this paper, we consider the case in which specimens are combined for the purpose of determining the presence or absence of a pool-wise exposure, in lieu of assessing the actual binary exposure status for each member of the pool. We presume a primary logistic regression model for an observed binary outcome, together with a secondary regression model for exposure. We facilitate maximum likelihood analysis by complete enumeration of the possible implications of a positive pool, and we discuss the applicability of this approach under both cross-sectional and case-control sampling. We also provide a maximum likelihood approach for longitudinal or repeated measures studies where the binary outcome and exposure are assessed on multiple occasions and within-subject pooling is conducted for exposure assessment. Simulation studies illustrate the performance of the proposed approaches along with their computational feasibility using widely available software. We apply the methods to investigate gene-disease association in a population-based case-control study of colorectal cancer. Copyright ? 2012 John Wiley & Sons, Ltd.  相似文献   

2.
Case‐control studies are prone to low power for testing gene–environment interactions (GXE) given the need for a sufficient number of individuals on each strata of disease, gene, and environment. We propose a new study design to increase power by strategically pooling biospecimens. Pooling biospecimens allows us to increase the number of subjects significantly, thereby providing substantial increase in power. We focus on a special, although realistic case, where disease and environmental statuses are binary, and gene status is ordinal with each individual having 0, 1, or 2 minor alleles. Through pooling, we obtain an allele frequency for each level of disease and environmental status. Using the allele frequencies, we develop a new methodology for estimating and testing GXE that is comparable to the situation when we have complete data on gene status for each individual. We also explore the measurement process and its effect on the GXE estimator. Using an illustration, we show the effectiveness of pooling with an epidemiologic study, which tests an interaction for fiber and paraoxonase on anovulation. Through simulation, we show that taking 12 pooled measurements from 1000 individuals achieves more power than individually genotyping 500 individuals. Our findings suggest that strategic pooling should be considered when an investigator designs a pilot study to test for a GXE. Published 2012. This article is a US Government work and is in the public domain in the USA.  相似文献   

3.
The genetic dissection of complex human diseases requires large-scale association studies which explore the population associations between genetic variants and disease phenotypes. DNA pooling can substantially reduce the cost of genotyping assays in these studies, and thus enables one to examine a large number of genetic variants on a large number of subjects. The availability of pooled genotype data instead of individual data poses considerable challenges in the statistical inference, especially in the haplotype-based analysis because of increased phase uncertainty. Here we present a general likelihood-based approach to making inferences about haplotype-disease associations based on possibly pooled DNA data. We consider cohort and case-control studies of unrelated subjects, and allow arbitrary and unequal pool sizes. The phenotype can be discrete or continuous, univariate or multivariate. The effects of haplotypes on disease phenotypes are formulated through flexible regression models, which allow a variety of genetic hypotheses and gene-environment interactions. We construct appropriate likelihood functions for various designs and phenotypes, accommodating Hardy-Weinberg disequilibrium. The corresponding maximum likelihood estimators are approximately unbiased, normally distributed, and statistically efficient. We develop simple and efficient numerical algorithms for calculating the maximum likelihood estimators and their variances, and implement these algorithms in a freely available computer program. We assess the performance of the proposed methods through simulation studies, and provide an application to the Finland-United States Investigation of NIDDM Genetics Study. The results show that DNA pooling is highly efficient in studying haplotype-disease associations. As a by-product, this work provides valid and efficient methods for estimating haplotype-disease associations with unpooled DNA samples.  相似文献   

4.
Pooling-based strategies that combine samples from multiple participants for laboratory assays have been proposed for epidemiologic investigations of biomarkers to address issues including cost, efficiency, detection, and when minimal sample volume is available. A modification of the standard logistic regression model has been previously described to allow use with pooled data; however, this model makes assumptions regarding exposure distribution and logit-linearity of risk (i.e., constant odds ratio) that can be violated in practice. We were motivated by a nested case-control study of miscarriage and inflammatory factors with highly skewed distributions to develop a more flexible model for analysis of pooled data. Using characteristics of the gamma distribution and the relation between models of binary outcome conditional on exposure and of exposure conditional on outcome, we use a modified logistic regression to accommodate nonlinearity because of unequal shape parameters in gamma distributed exposure for cases and controls. Using simulations, we compare our approach with existing methods for logistic regression for pooled data considering: (1) constant and dose-dependent effects; (2) gamma and log-normal distributed exposure; (3) effect size; and (4) the proportions of biospecimens pooled. We show that our approach allows estimation of odds ratios that vary with exposure level, yet has minimal loss of efficiency compared with existing approaches when exposure effects are dose-invariant. Our model performed similarly to a maximum likelihood estimation approach in terms of bias and efficiency, and provides an easily implemented approach for estimation with pooled biomarker data when effects may not be constant across exposure. Copyright ? 2012 John Wiley & Sons, Ltd.  相似文献   

5.
Pooling biospecimens prior to performing lab assays can help reduce lab costs, preserve specimens, and reduce information loss when subject to a limit of detection. Because many biomarkers measured in epidemiological studies are positive and right‐skewed, proper analysis of pooled specimens requires special methods. In this paper, we develop and compare parametric regression models for skewed outcome data subject to pooling, including a novel parameterization of the gamma distribution that takes full advantage of the gamma summation property. We also develop a Monte Carlo approximation of Akaike's Information Criterion applied to pooled data in order to guide model selection. Simulation studies and analysis of motivating data from the Collaborative Perinatal Project suggest that using Akaike's Information Criterion to select the best parametric model can help ensure valid inference and promote estimate precision. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

6.
Pooled analysis is a method frequently used in epidemiology when individual studies are too small to allow any definite conclusion. Several guidelines have been published on pooling of classical epidemiological studies, but no information is available on pooling of studies involving biological markers. We have used information of two recently started pooled analyses, one on cytogenetic damage, the other on genetic susceptibility to environmental carcinogens, in order to make inferences on the applicability of a pooled analysis to molecular epidemiological studies. Issues in pooling data from epidemiological studies involving molecular markers are described here, including the choice of study design, planning and conducting of the study, data request, collection and storage, and costs. Some practical indications on the conduct of a pooled analysis in molecular epidemiology is given.  相似文献   

7.
The simple pooling of data is often used to provide an overall summary of subgroup data or data from a number of related studies. In simple pooling, data are combined without being weighted. Therefore, the analysis is performed as if the data were derived from a single sample. This kind of analysis ignores characteristics of the subgroups or individual studies being pooled and can yield spurious or counterintuitive results. In meta-analysis, data from subgroups or individual studies are weighted first, then combined, thereby avoiding some of the problems of simple pooling. The purpose of this article is to describe how simple pooling differs from meta-analysis, provide a detailed analysis of why simple pooling can be a poor procedure, and show that combining by meta-analytic methods avoids such problems.  相似文献   

8.
Epidemiologic studies of occupational cohorts have played a major role in the quantitative assessment of risks associated with several carcinogenic hazards and are likely to play an increasingly important role in this area. Relatively little attention has been given in either the epidemiologic or the risk assessment literature to the development of appropriate methods for modeling epidemiologic data for quantitative risk assessment (QRA). The purpose of this paper is to review currently available methods for modeling epidemiologic data for risk assessment. The focus of this paper is on methods for use with retrospective cohort mortality studies of occupational groups for estimating cancer risk, since these are the data most commonly used when epidemiologic information is used for QRA. Both empirical (e.g., Poisson regression and Cox proportionate hazards model) and biologic (e.g., two-stage models) models are considered. Analyses of a study of lung cancer among workers exposed to cadmium are used to illustrate these modeling methods. Based on this example it is demonstrated that the selection of a particular model may have a large influence on the resulting estimates of risk.  相似文献   

9.
A method for meta-analysis of molecular association studies   总被引:8,自引:0,他引:8  
Although population-based molecular association studies are becoming increasingly popular, methodology for the meta-analysis of these studies has been neglected, particularly with regard to two issues: testing Hardy-Weinberg equilibrium (HWE), and pooling results in a manner that reflects a biological model of gene effect. We propose a process for pooling results from population-based molecular association studies which consists of the following steps: (1) checking HWE using chi-square goodness of fit; we suggest performing sensitivity analysis with and without studies that are in HWE. (2) Heterogeneity is then checked, and if present, possible causes are explored. (3) If no heterogeneity is present, regression analysis is used to pool data and to determine the gene effect. (4) If there is a significant gene effect, pairwise group differences are analysed and these data are allowed to 'dictate' the best genetic model. (5) Data may then be pooled using this model. This method is easily performed using standard software, and has the advantage of not assuming an a priori genetic model.  相似文献   

10.
Evaluating biomarkers in epidemiological studies can be expensive and time consuming. Many investigators use techniques such as random sampling or pooling biospecimens in order to cut costs and save time on experiments. Commonly, analyses based on pooled data are strongly restricted by distributional assumptions that are challenging to validate because of the pooled biospecimens. Random sampling provides data that can be easily analyzed. However, random sampling methods are not optimal cost‐efficient designs for estimating means. We propose and examine a cost‐efficient hybrid design that involves taking a sample of both pooled and unpooled data in an optimal proportion in order to efficiently estimate the unknown parameters of the biomarker distribution. In addition, we find that this design can be used to estimate and account for different types of measurement and pooling error, without the need to collect validation data or repeated measurements. We show an example where application of the hybrid design leads to minimization of a given loss function based on variances of the estimators of the unknown parameters. Monte Carlo simulation and biomarker data from a study on coronary heart disease are used to demonstrate the proposed methodology. Published in 2010 by John Wiley & Sons, Ltd.  相似文献   

11.
Methods for the meta-analysis of results from randomized controlled trials are well established. However, there are currently no methods for the meta-analysis of method comparison studies. Here the combination of results from studies comparing two methods of measurement on the same unit of observation is required. We compare standard methods for the pooling of k samples from the same Normal population to those for pooling parameter estimates, in order to estimate the pooled mean difference and 95 per cent limits of agreement. Methods for investigating heterogeneity across studies and for calculating random effects estimates are proposed. We postulate that for published studies either the estimated mean or variance of the difference between measurements will tend to be smaller than for unpublished studies and investigate the evidence for the existence of such publication bias. The methods are illustrated with an example evaluating the accuracy of temperature measured at the axilla compared to the rectum in children.  相似文献   

12.
Meta analysis is a collection of quantitative methods devoted to combine summary information from related but independent studies. Because research reports usually present only data reductions and summary statistics rather than detailed data, the reviewer must often resort to rather crude methods for constructing summary effect estimate suitable for meta analysis pooling methods. When the studies involve a binary variable, both number of events and sample sizes are required to compute pooled estimate and its confidence interval. Sometimes, only summary statistics and related confidence intervals are provided in the publication. Although it is possible to estimate the standard error of each study's effect measure using the confidence interval from each study, this lack of detailed data compels the reviewers to use the inverse variance method to perform meta analysis, or to exclude the works with incomplete data. This paper shows three methods to reconstruct four-fold tables when summary measures for binary data and related confidence intervals and sample sizes are provided. The methods are discussed through a wider application example to assess the reconstruction precision, and the impact of using reconstructed data on meta analysis results. These methods seem to yield a correct reconstruction if original measures are reported at least with two decimal places. Meta analysis results do not seem seriously affected by the use of reconstructed data. These methods allow the reviewer to use full meta analysis statistical tools, instead of the simple inverse variance method, and can greatly contribute to the completeness of systematic reviews.  相似文献   

13.
Epidemiologic studies of disease often produce inconclusive or contradictory results due to small sample sizes or regional variations in the disease incidence or the exposures. To clarify these issues, researchers occasionally pool and reanalyse original data from several large studies. In this paper we explore the use of a two-stage random-effects model for analysing pooled case-control studies and undertake a thorough examination of bias in the pooled estimator under various conditions. The two-stage model analyses each study using the model appropriate to the design with study-specific confounders, and combines the individual study-specific adjusted log-odds ratios using a linear mixed-effects model; it is computationally simple and can incorporate study-level covariates and random effects. Simulations indicate that when the individual studies are large, two-stage methods produce nearly unbiased exposure estimates and standard errors of the exposure estimates from a generalized linear mixed model. By contrast, joint fixed-effects logistic regression produces attenuated exposure estimates and underestimates the standard error when heterogeneity is present. While bias in the pooled regression coefficient increases with interstudy heterogeneity for both models, it is much smaller using the two-stage model. In pooled analyses, where covariates may not be uniformly defined and coded across studies, and occasionally not measured in all studies, a joint model is often not feasible. The two-stage method is shown to be a simple, valid and practical method for the analysis of pooled binary data. The results are applied to a study of reproductive history and cutaneous melanoma risk in women using data from ten large case-control studies.  相似文献   

14.
Additive measurement errors and pooling design are objectively two different issues, which have been separately and extensively dealt with in the biostatistics literature. However, these topics usually correspond to problems of reconstructing a summand's distribution of the biomarker by the distribution of the convoluted observations. Thus, we associate the two issues into one stated problem. The integrated approach creates an opportunity to investigate new fields, e.g. a subject of pooling errors, issues regarding pooled data affected by measurement errors. To be specific, we consider the stated problem in the context of the receiver operating characteristic (ROC) curves analysis, which is the well-accepted tool for evaluating the ability of a biomarker to discriminate between two populations. The present paper considers a wide family of biospecimen distributions. In addition, applied assumptions, which are related to distribution functions of biomarkers, are mainly conditioned by the reconstructing problem. We propose and examine maximum likelihood techniques based on the following data: a biomarker with measurement error; pooled samples; and pooled samples with measurement error. The obtained methods are illustrated by applications to real data studies.  相似文献   

15.
We describe ratio estimation methods for multivariately analysing incidence densities from prospective epidemiologic studies. Commonly used in survey data analysis, these ratio methods require minimal distributional assumptions and take into account the random variability in the at-risk periods. We illustrate their application with data from a study of lower respiratory illness (LRI) in children during the first year of life. One question of interest is whether children with passive exposure to tobacco smoke have a higher rate of LRI, on average, than those with no exposure and in a setting where age of child and season are taken into account. A second question is whether the relationship persists after adjusting for background variables such as family's socioeconomic status, crowding in the home, race, and type of feeding. The basic strategy consists of a two-step process in which we first estimate subgroup-specific incidence densities and their covariance matrix via a first-order Taylor series approximation. These estimates are used to test for differences in marginal rates of LRI between children exposed to tobacco smoke and those not exposed. We then fit a log-linear model to the estimated ratios in order to test for significant covariate effects. The ability to produce direct estimates of adjusted incidence density ratios for risk factors of interest is an important advantage of this approach. For comparison purposes and to address the limitations of the ratio method with respect to the number of covariates that can be controlled simultaneously, we consider survey logistic regression methods for the example data as well as logistic and Poisson regression models fitted via generalized estimating equation methods. Although the analysis strategy is illustrated with illness data from an epidemiologic study, the context of application is broader and includes, for example, data on adverse events from a clinical trial.  相似文献   

16.
Meta-analysis has been little explored to make an overall assessment of linkage from different studies. In practice, it is likely that published linkage studies will only report p-values. We compared the performance of the widely used Fisher method for combining p-values with that of pooling raw data. More loci were consistently found by pooling raw data. In the absence of further information, combining p-values can provide an overall, but limited, assessment of different linkage studies. However, meta-analysis would be better viewed as a preliminary step toward the goal of analyzing the pooled raw data.  相似文献   

17.
Traditional reviews, meta-analyses and pooled analyses in epidemiology   总被引:13,自引:0,他引:13  
BACKGROUND: The use of review articles and meta-analysis has become an important part of epidemiological research, mainly for reconciling previously conducted studies that have inconsistent results. Numerous methodologic issues particularly with respect to biases and the use of meta-analysis are still controversial. METHODS: Four methods summarizing data from epidemiological studies are described. The rationale for meta-analysis and the statistical methods used are outlined. The strengths and limitations of these methods are compared particularly with respect to their ability to investigate heterogeneity between studies and to provide quantitative risk estimation. RESULTS: Meta-analyses from published data are in general insufficient to calculate a pooled estimate since published estimates are based on heterogeneous populations, different study designs and mainly different statistical models. More reliable results can be expected if individual data are available for a pooled analysis, although some heterogeneity still remains. Large prospective planned meta-analysis of multicentre studies would be preferable to investigate small risk factors, however this type of meta-analysis is expensive and time-consuming. CONCLUSION: For a full assessment of risk factors with a high prevalence in the general population, pooling of data will become increasingly important. Future research needs to focus on the deficiencies of review methods, in particular, the errors and biases that can be produced when studies are combined that have used different designs, methods and analytic models.  相似文献   

18.
We investigated a variety of methods for pooling data from eight data sets (n = 5,424 subjects) to validate evidence for linkage of markers in the cytokine cluster on chromosome 5q31–33 to asthma and asthma‐associated phenotypes. Chromosome 5 markers were integrated into current genetic linkage and physical maps, and a consensus map was constructed to facilitate effective data pooling. To provide more informative phenotypes with better distributional properties, variance component models were fitted using Gibbs sampling methods in order to generate residual additive genetic effects, or sigma‐squared‐A‐random‐effects (SSARs), which were used as derived phenotypes in subsequent linkage analyses. Multipoint estimates of alleles shared identically by descent (IBD) were computed for all full sibling pairs. Linkage analyses were performed with a new Haseman‐Elston method that uses generalized‐least‐squares and a weighted combination of the mean‐corrected trait‐sum squared and trait‐difference squared as the dependent variable. Analyses were performed with all data sets pooled together, and also separately with the resulting linkage statistics pooled by several meta‐analytic methods. Our results provide no significant evidence that loci conferring susceptibility to asthma affection or atopy, as measured by total serum IgE levels, are present in the 5q31–33 region. This study has provided a clearer understanding of the significance, or lack of significance, of the 5q31–33 region in asthma genetics for the phenotypes studied. © 2001 Wiley‐Liss, Inc.  相似文献   

19.
We review several issues of broad relevance to the interpretation of epidemiologic evidence concerning the toxicity of lead in adults, particularly regarding cognitive function and the cardiovascular system, which are the subjects of two systematic reviews that are also part of this mini-monograph. Chief among the recent developments in methodologic advances has been the refinement of concepts and methods for measuring individual lead dose in terms of appreciating distinctions between recent versus cumulative doses and the use of biological markers to measure these parameters in epidemiologic studies of chronic disease. Attention is focused particularly on bone lead levels measured by K-shell X-ray fluorescence as a relatively new biological marker of cumulative dose that has been used in many recent epidemiologic studies to generate insights into lead's impact on cognition and risk of hypertension, as well as the alternative method of estimating cumulative dose using available repeated measures of blood lead to calculate an individual's cumulative blood lead index. We review the relevance and interpretation of these lead biomarkers in the context of the toxico-kinetics of lead. In addition, we also discuss methodologic challenges that arise in studies of occupationally and environmentally exposed subjects and those concerning race/ethnicity and socioeconomic status and other important covariates.  相似文献   

20.
Study populations examined in epidemiologic investigations of occupational disease risks often are assembled by the pooling of employee data from several workplaces that share common exposure factors. The primary objectives of this approach are to enhance the representativeness of the overall study population and to obtain sufficient employee sample sizes in exposure subgroups of interest. Among the many epidemiologic aspects that must be considered carefully in such industry-wide studies is the inter- and intra-company or plant comparability of employee work history data. Currently, the literature is void of articles that specifically address the fundamental data reduction and statistical analysis issues related to merging work history data from several distinct cohorts. This paper describes the basic methodologic problems associated with the pooling of work history data and proposes a joint job title/job exposure-based uniform coding scheme (UCS) that facilitates the aggregation of data from similar or diverse cohorts in a variety of epidemiologic study settings. The utility of the UCS as both an efficient data coding structure and a flexible basis for statistical analysis is described within the context of a popular computer software program for analyzing occupational cohort data. The fundamental features of the UCS are illustrated using data from a recent industry-wide study of copper and zinc smelter workers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号