首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
The creation of single nucleotide polymorphism (SNP) databases (such as NCBI dbSNP) has facilitated scientific research in many fields. SNP discovery and detection has improved to the extent that there are over 17 million human reference (rs) SNPs reported to date (Build 129 of dbSNP). SNP databases are unfortunately not always complete and/or accurate. In fact, half of the reported SNPs are still only candidate SNPs and are not validated in a population. We describe the identification of SNDs (single nucleotide differences) in humans, that may contaminate the dbSNP database. These SNDs, reported as real SNPs in the database, do not exist as such, but are merely artifacts due to the presence of a paralogue (highly similar duplicated) sequence in the genome. Using sequencing we showed how SNDs could originate in two paralogous genes and evaluated samples from a population of 100 individuals for the presence/absence of SNPs. Moreover, using bioinformatics, we predicted as many as 8.32% of the biallelic, coding SNPs in the dbSNP database to be SNDs. Our identification of SNDs in the database will allow researchers to not only select truly informative SNPs for association studies, but also aid in determining accurate SNP genotypes and haplotypes. Hum Mutat 31:67–73, 2010. © 2009 Wiley‐Liss, Inc.  相似文献   

2.
 Single-nucleotide polymorphisms (SNPs) located in coding regions (coding SNPs; cSNPs) with amino acid substitution can potentially alter protein function. Therefore, identification of the nonsynonymous cSNPs of the genes of common diseases is valuable in tests of association with phenotypes. In this study, we validated 525 candidate cSNPs from 179 hypertension candidate genes deposited in the publicly available database dbSNP by DNA sequencing of samples from 32 Japanese individuals. We identified a total of 143 SNPs (27%) in 93 hypertension candidate genes. We also identified 16 new SNPs, for a total of 159 SNPs. Of the 159 SNPs thus identified, 104 were nonsynonymous. We estimate that approximately 20% of the SNPs deposited in dbSNP database showed a minor allele frequency of over 5%. The candidate SNPs for hypertension identified in this study would be valuable for association studies with hypertension to accelerate the identification of hypertension genes. Received: March 14, 2002 / Accepted: April 15, 2002  相似文献   

3.
We experimentally investigated more than 1,200 entries in dbSNP that would change amino-acids (nsSNPs), using various subsets of DNA samples drawn from 18 global populations (approximately 1,000 subjects in total). First, we mined the data for any SNP features that correlated with a high validation rate. Useful predictors of valid SNPs included multiple submissions to dbSNP, having a dbSNP validation statement, and being present in a low number of ESTs. Together, these features improved validation rates by almost 10-fold. Higher-abundance SNPs (e.g., T/C variants) also validated more frequently. Second, we considered derived alleles and noted a considerably (approximately 10%) increased average derived allele frequency (DAF) in Europeans vs. Africans, plus a further increase in some other populations. This was not primarily due to an SNP ascertainment bias, nor to the effects of natural selection. Instead, it can be explained as a drift-based, progressive increase in DAF that occurs over many generations and becomes exaggerated during population bottlenecks. This observation could be used as the basis for novel DAF-based tests for comparing demographic histories. Finally, we considered individual marker patterns and identified 37 SNPs with allele frequency variance or FST values consistent with the effects of population-specific natural selection. Four particularly striking clusters of these markers were apparent, and three of these coincide with genes/regions from among only several dozen such domains previously suggested by others to carry signatures of selection.  相似文献   

4.
In the attempt to understand human variation and the genetic basis of complex disease, a tremendous number of single nucleotide polymorphisms (SNPs) have been discovered and deposited into NCBI's dbSNP public database. More than 2.7 million SNPs in the database have genotype information. This data provides an invaluable resource for understanding the structure of human variation and the design of genetic association studies. The genotypes deposited to dbSNP are unphased, and thus, the haplotype information is unknown. We applied the phasing method HAP to obtain the haplotype information, block partitions, and tag SNPs for all publicly available genotype data and deposited this information into the dbSNP database. We also deposited the orthologous chimpanzee reference sequence for each predicted haplotype block computed using the UCSC BLASTZ alignments of human and chimpanzee. Using dbSNP, researchers can now easily perform analyses using multiple genotype data sets from the same genomic regions. Dense and sparse genotype data sets from the same region were combined to show that the number of common haplotypes is significantly underestimated in whole genome data sets, while the predicted haplotypes over the common SNPs are consistent between studies. To validate the accuracy of the predictions, we bench-marked HAP's running time and phasing accuracy against PHASE. Although HAP is slightly less accurate than PHASE, HAP is over 1000 times faster than PHASE, making it suitable for application to the entire set of genotypes in dbSNP.  相似文献   

5.
To facilitate the association studies in complex diseases characterized by hyperhomocysteinemia, we collected structural and frequency data on single-nucleotide polymorphism (SNPs) in 24 genes relating to homocysteine metabolism. Firstly, we scanned approximately 1.2 Mbp of sequence in the NCBI SNP database (dbSNP) build 110 and we detected 1353 putative SNPs with an average in silico genic density of 1:683. Out of 112 putative SNPs in coding regions (cSNPs), we selected a subset of 42 cSNPs and we assessed the applicability of the NCBI dbSNP to the Czech population - a typical representative of European Caucasians - by determining the frequency of the putative cSNPs experimentally by PCR-RFLP or ARMS-PCR in at least 110 control Czech chromosomes. As only 25 of the 42 analyzed cSNPs met the criterion of >/=1% frequency, the positive predictive value of the NCBI data set for our population reached 60%, which is similar to other studies. The correlation of SNP frequency between Czechs and other Caucasians - obtained from NCBI and/or literature - was stronger (r(2)=0.90 for 20 cSNPs) than between Czechs and general NCBI database entries (r(2)=0.73 for 27 cSNPs). Moreover, frequencies of all 20 putative cSNPs, for which data in Caucasians were available, were congruently below or above the 1% frequency criterion both in Czechs and in other Caucasians. In summary, our study shows that the NCBI dbSNP is a useful tool for selecting cSNPs for genetic studies of hyperhomocysteinemia in European populations, although experimental validation of SNPs should be performed, especially if the cSNP entry lacks any frequency data in Caucasians.  相似文献   

6.
Different strategies to search public single nucleotide polymorphism (SNP) databases for intragenic SNPs were evaluated. First, we assembled a strategy to annotate SNPs onto candidate genes based on a BLAST search of public SNP databases (Intragenic SNP Annotation by BLAST, ISAB). Only BLAST hits that complied with stringent criteria according to 1) percentage identity (minimum 98%), 2) BLAST hit length (the hit covers at least 98% of the length of the SNP entry in the database, or the hit is longer than 250 base pairs), and 3) location in non-repetitive DNA, were considered as valid SNPs. We assessed the intragenic context and redundancy of these SNPs, and demonstrated that the SNP content of the dbSNP and HGBASE/HGVbase databases are highly complementary but also overlap significantly. Second, we assessed the validity of intragenic SNP annotation available on the dbSNP and HGVbase websites by comparison with the results of the ISAB strategy. Only a minority of all annotated SNPs was found in common between the respective public SNP database websites and the ISAB annotation strategy. A detailed analysis was performed aiming to explain this discrepancy. As a conclusion, we recommend the application of an independent strategy (such as ISAB) to annotate intragenic SNPs, complementary to the annotation provided at the dbSNP and HGVbase websites. Such an approach might be useful in the selection process of intragenic SNPs for genotyping in genetic studies. Hum Mutat 20:162-173, 2002.  相似文献   

7.
The evolutionary and biomedical importance of differential mRNA splicing is well established. Numerous studies have assessed patterns of differential splicing in different genes and correlated these patterns to the genotypes for adjacent single‐nucleotide polymorphisms (SNPs). Here, we have chosen a reverse approach and screened dbSNP for common SNPs at either canonical splice sites or exonic splice enhancers (ESEs) that would be classified as putatively splicing‐relevant by bioinformatic tools. The 223 candidate SNPs retrieved from dbSNP were experimentally tested using a previously established panel of 92 matching DNAs and cDNAs. For each SNP, 16 cDNAs providing a balanced representation of the genotypes at the respective SNP were investigated by nested RT‐PCR and subsequent sequencing. Putative allele‐dependent splicing was verified by the cloning of PCR products. The positive predictive value of the bioinformatics tools turned out to be low, ranging from 0% for ESEfinder to 9% (in the case of acceptor‐site SNPs) for a recently reported neural network. The results highlight the need for a better understanding of the sequence characteristics of functional splice‐sites to improve our ability to predict in silico the splicing relevance of empirically observed DNA sequence variants. Hum Mutat 0, 1–9, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

8.
We have examined the patterns of DNA sequence variation in and around the genes coding for ICAM1 and TNF, which play functional and correlated roles in inflammatory processes and immune cell responses, in 12 diverse ethnic groups of India. We aimed to (a) quantify the nature and extent of the variation, and (b) analyse the observed patterns of variation in relation to population history and ethnic background. At the ICAM1 and TNF loci, respectively, the total numbers of SNPs that were detected were 28 and 12. Many of these SNPs are not shared across ethnic groups and are unreported in the dbSNP or TSC databases, including two fairly common non‐synonymous SNPs at positions 13487 and 13542 in the ICAM1 gene. Conversely, the TNF‐376A SNP that is reported to be associated with susceptibility to malaria was not found in our study populations, even though some of the populations inhabit malaria endemic areas. Wide between‐population variation in the frequencies of shared SNPs and coefficients of linkage disequilibrium have been observed. These findings have profound implications in case‐control association studies.  相似文献   

9.
Currently, >14,000,000 single nucleotide polymorphisms (SNPs) are reported. Identifying phenotype‐affecting SNPs among these many SNPs pose significant challenges. Although several Web resources are available that can inform about the functionality of SNPs, these resources are mainly annotation databases and are not very comprehensive. In this article, we present a comprehensive, well‐annotated, integrated pfSNP (potentially functional SNPs) Web resource ( http://pfs.nus.edu.sg/ ), which is aimed to facilitate better hypothesis generation through knowledge syntheses mediated by better data integration and a user‐friendly Web interface. pfSNP integrates >40 different algorithms/resources to interrogate >14,000,000 SNPs from the dbSNP database for SNPs of potential functional significance based on previous published reports, inferred potential functionality from genetic approaches as well as predicted potential functionality from sequence motifs. Its query interface has the user‐friendly “auto‐complete, prompt‐as‐you‐type” feature and is highly customizable, facilitating different combination of queries using Boolean‐logic. Additionally, to facilitate better understanding of the results and aid in hypotheses generation, gene/pathway‐level information with text clouds highlighting enriched tissues/pathways as well as detailed‐related information are also provided on the results page. Hence, the pfSNP resource will be of great interest to scientists focusing on association studies as well as those interested to experimentally address the functionality of SNPs. Hum Mutat 32:19–24, 2011. © 2010 Wiley‐Liss, Inc.  相似文献   

10.
The immunoglobulin superfamily 6 gene (IGSF6) on chromosome 16p11‐p12 has been investigated as a positional and functional candidate for inflammatory bowel disease (IBD) susceptibility. Screening of the six exons of IGSF6 for single nucleotide polymorphisms (SNPs) detected four novel SNPs, and validated three of six SNPs listed in the international SNP database (dbSNP). The seven SNPs in IGSF6 formed five distinct linkage disequilibrium groups. There was no evidence for association of the common SNPs with disease in a large cohort of patients with IBD. The novel SNPs and the linkage disequilibrium map will be a useful resource for the analysis of IGSF6 in other immune disorders.  相似文献   

11.
dbSNP is a general catalog of genetic polymorphism maintained by NCBI, mainly collating information for single nucleotide variations, many of which will be single nucleotide polymorphisms (SNPs), but also including small indels. It takes submissions from many sources, now also including large numbers of sequence variants identified by next‐generation sequencing. A number of differently designed studies have attempted to estimate the error rates in data archived in dbSNP. Most recently, a study added to earlier studies identifying specific issues for duplicons and copy number variations (CNVs); earlier analyses have focused on stop codons, splice sites, and the general content of dbSNP. This article overviews dbSNP itself, these studies, and their implications. Hum Mutat 30:1–3, 2009. © 2009 Wiley‐Liss, Inc.  相似文献   

12.
PurposeThis study aimed to quantitatively assess the incidence of hearing loss in relation to age in individuals with biallelic p.V37I variant in GJB2.MethodsPopulation screening of the biallelic p.V37I variant was performed in 30,122 individuals aged between 0 and 97 years in Shanghai. Hearing thresholds of the biallelic p.V37I individuals and the controls were determined by click auditory brainstem response or pure tone audiometry.ResultsBiallelic p.V37I was detected in 0.528% (159/30,122) of the subjects. Of the biallelic p.V37I newborns, 43.91% (18/41) passed their distortion-product otoacoustic emissions–based newborn hearing screening or had hearing thresholds lower than 20 decible above normal hearing level. The older newborns had elevated hearing thresholds, with increasing incidence of 9.52%, 23.08%, 59.38%, and 80.00% for moderate or higher grade of hearing loss in age groups of 7 to 15 years, 20 to 40 years, 40 to 60 years, and 60 to 85 years, respectively. Their hearing deteriorated at a rate of 0.40 dB hearing level per year on average; males were more susceptible, and deterioration occurred preferentially at higher sound frequencies.ConclusionThe biallelic p.V37I variant is associated with steadily progressive hearing loss with increasing incidence over the course of life. Most of the biallelic p.V37I individuals may develop significant hearing loss in adulthood and, can benefit from early diagnosis and intervention through wide-spread genetic screening.  相似文献   

13.
Large-scale discovery and validation of single-nucleotide polymorphisms (SNPs) facilitates indirect association mapping. It has recently been estimated that, in Europeans, 77% of all SNPs with frequency of 10% or more could be ascertained through linkage disequilibrium (LD) by genotyping variants in the database dbSNP. Using a sampling approach from 73 genes with near complete SNP maps, we show here the usefulness of SNP maps at different densities and the large variability of SNP coverage in different genomic regions. While even sparse SNP maps are of some value to genetic mapping, in order to undertake disease association studies providing at least 80% of SNPs in 90% of genes, much denser maps need to be constructed, at more than one SNP per kb in some regions.  相似文献   

14.
We report here three high-density maps of variations found among 48 Japanese individuals in three uridine diphosphate glycosyltransferase (UGT) genes, UGT2A1, UGT2B15, and UGT8. A total of 86 single-nucleotide polymorphisms (SNPs) were identified through systematic screening of genomic regions containing these genes: 8 in 5' flanking regions, 7 in coding regions, 67 in introns, 3 in 3' untranslated regions, and 1 in a 3' flanking region. We also discovered 14 variations of other types. Of the 86 SNPs, 63 (73%) were considered to be novel on the basis of comparison of our data with the Database of SNPs (dbSNP) of the National Center for Biotechnology Information. Among the seven SNPs identified in exonic sequences, five were non-synonymous changes that would result in amino-acid substitutions. The collection of SNPs derived from this study will serve as an additional resource for studies of complex genetic diseases and responsiveness to drug therapy.  相似文献   

15.
Most common human diseases are likely to have complex etiologies. Methods of analysis that allow for the phenomenon of epistasis are of growing interest in the genetic dissection of complex diseases. By allowing for epistatic interactions between potential disease loci, we may succeed in identifying genetic variants that might otherwise have remained undetected. Here we aimed to analyze the ability of logistic regression (LR) and two tree‐based supervised learning methods, classification and regression trees (CART) and random forest (RF), to detect epistasis. Multifactor‐dimensionality reduction (MDR) was also used for comparison. Our approach involves first the simulation of datasets of autosomal biallelic unphased and unlinked single nucleotide polymorphisms (SNPs), each containing a two‐loci interaction (causal SNPs) and 98 ‘noise’ SNPs. We modelled interactions under different scenarios of sample size, missing data, minor allele frequencies (MAF) and several penetrance models: three involving both (indistinguishable) marginal effects and interaction, and two simulating pure interaction effects. In total, we have simulated 99 different scenarios. Although CART, RF, and LR yield similar results in terms of detection of true association, CART and RF perform better than LR with respect to classification error. MAF, penetrance model, and sample size are greater determining factors than percentage of missing data in the ability of the different techniques to detect true association. In pure interaction models, only RF detects association. In conclusion, tree‐based methods and LR are important statistical tools for the detection of unknown interactions among true risk‐associated SNPs with marginal effects and in the presence of a significant number of noise SNPs. In pure interaction models, RF performs reasonably well in the presence of large sample sizes and low percentages of missing data. However, when the study design is suboptimal (unfavourable to detect interaction in terms of e.g. sample size and MAF) there is a high chance of detecting false, spurious associations.  相似文献   

16.
The human alcohol dehydrogenase 4 (ADH4) gene encodes the class II ADH4 pi subunit, which contributes to the metabolization of a wide variety of substrates, including ethanol, retinol, other aliphatic alcohols, hydroxysteroids, and lipid peroxidation products. Here we report the results of systematic screening for single-nucleotide polymorphisms (SNPs) in the ADH4 gene by means of direct sequencing combined with a polymerase chain reaction method. A total of 16 genetic variations including 13 SNPs were found; 4 in the 5′ flanking region, 4 in the 5′ untranslated region, and 8 within introns. No variation was found in coding, 3′ untranslated, or 3′ flanking regions. Eight of the 13 SNPs were not reported in the NCBI dbSNP database or any previous publications. Our SNP map presented here should provide tools to evaluate the role of ADH4 in complex genetic diseases and a variety of pharmacogenetic effects. Received: September 26, 2001 / Accepted: October 11, 2001  相似文献   

17.
The immunoglobulin superfamily 6 gene (IGSF6) on chromosome 16p11-p12 has been investigated as a positional and functional candidate for inflammatory bowel disease (IBD) susceptibility. Screening of the six exons of IGSF6 for single nucleotide polymorphisms (SNPs) detected four novel SNPs, and validated three of six SNPs listed in the international SNP database (dbSNP). The seven SNPs in IGSF6 formed five distinct linkage disequilibrium groups. There was no evidence for association of the common SNPs with disease in a large cohort of patients with IBD. The novel SNPs and the linkage disequilibrium map will be a useful resource for the analysis of IGSF6 in other immune disorders.  相似文献   

18.
 We have been publishing a series of detailed maps of single-nucleotide polymorphisms (SNPs) detected within the genomic loci of 145 genes encoding drug-metabolizing enzymes and transporters. As an addition to the maps reported earlier, we provide here high-density SNP maps of 31 genes encoding various receptors and adhesion molecules of medical importance. By examining a total of approximately 382 kb of genomic DNA encompassing these 31 genes, we identified 668 SNPs among 48 healthy Japanese individuals: 86 in 5′ flanking regions, 27 in 5′ untranslated regions, 45 in coding regions, 399 in introns, 47 in 3′ untranslated regions, and 64 in 3′ flanking regions. We also discovered 113 variations of other types. Of the 668 SNPs, 371 (55.5%) appeared to be novel, on the basis of comparisons with the dbSNP database of the National Center for Biotechnology Information (US) or with previous publications. The maps constructed in this study will serve as an additional resource for studies of complex genetic diseases and drug-response phenotypes to be mapped by linkage-disequilibrium analyses. Received: November 6, 2002 / Accepted: November 7, 2002 Correspondence to:Y. Nakamura  相似文献   

19.
The dopamine receptor 5 gene ( DRD5 ) holds much promise as a candidate locus for contributing to neuropsychiatric disorders and other diseases influenced by the dopaminergic system, as well as having potential to affect normal behavioral variation. However, detailed analyses of this gene have been complicated by its location within a segmentally duplicated chromosomal region. Microsatellites and SNPs upstream from the coding region have been used for association studies, but we find, using bioinformatics resources, that these markers all lie within a previously unrecognized second segmental duplication (SD). In order to accurately analyze the DRD5 locus for polymorphisms in the absence of contaminating pseudogene sequences, we developed a fast and reliable method for sequence analysis and genotyping within the DRD5 coding region. We employed restriction enzyme digestion of genomic DNA to eliminate the pseudogenes prior to PCR amplification of the functional gene. This approach allowed us to determine the DRD5 haplotype structure using 31 trios and to reveal additional rare variants in 171 unrelated individuals. We clarify the inconsistencies and errors of the recorded SNPs in dbSNP and HapMap and illustrate the importance of using caution when choosing SNPs in regions of suspected duplications. The simple and relatively inexpensive method presented herein allows for convenient analysis of sequence variation in DRD5 and can be easily adapted to other duplicated genomic regions in order to obtain good quality sequence data.  相似文献   

20.
We report here three high-density maps of variations found among 48 Japanese individuals in three uridine diphosphate glycosyltransferase (UGT) genes, UGT2A1, UGT2B15, and UGT8. A total of 86 single-nucleotide polymorphisms (SNPs) were identified through systematic screening of genomic regions containing these genes: 8 in 5′ flanking regions, 7 in coding regions, 67 in introns, 3 in 3′ untranslated regions, and 1 in a 3′ flanking region. We also discovered 14 variations of other types. Of the 86 SNPs, 63 (73%) were considered to be novel on the basis of comparison of our data with the Database of SNPs (dbSNP) of the National Center for Biotechnology Information. Among the seven SNPs identified in exonic sequences, five were non-synonymous changes that would result in amino-acid substitutions. The collection of SNPs derived from this study will serve as an additional resource for studies of complex genetic diseases and responsiveness to drug therapy. Received: June 12, 2002 / Accepted: June 13, 2002  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号