首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 13 毫秒
1.
Bioinformatics research relies heavily on the ability to discover and correlate data from various sources. The specialization of life sciences over the past decade, coupled with an increasing number of biomedical datasets available through standardized interfaces, has created opportunities towards new methods in biomedical discovery. Despite the popularity of semantic web technologies in tackling the integrative bioinformatics challenge, there are many obstacles towards its usage by non-technical research audiences. In particular, the ability to fully exploit integrated information needs using improved interactive methods intuitive to the biomedical experts. In this report we present ReVeaLD (a Real-time Visual Explorer and Aggregator of Linked Data), a user-centered visual analytics platform devised to increase intuitive interaction with data from distributed sources. ReVeaLD facilitates query formulation using a domain-specific language (DSL) identified by biomedical experts and mapped to a self-updated catalogue of elements from external sources. ReVeaLD was implemented in a cancer research setting; queries included retrieving data from in silico experiments, protein modeling and gene expression. ReVeaLD was developed using Scalable Vector Graphics and JavaScript and a demo with explanatory video is available at http://www.srvgal78.deri.ie:8080/explorer. A set of user-defined graphic rules controls the display of information through media-rich user interfaces. Evaluation of ReVeaLD was carried out as a game: biomedical researchers were asked to assemble a set of 5 challenge questions and time and interactions with the platform were recorded. Preliminary results indicate that complex queries could be formulated under less than two minutes by unskilled researchers. The results also indicate that supporting the identification of the elements of a DSL significantly increased intuitiveness of the platform and usability of semantic web technologies by domain users.  相似文献   

2.
Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.  相似文献   

3.
We present the first Korean individual genome sequence (SJK) and analysis results. The diploid genome of a Korean male was sequenced to 28.95-fold redundancy using the Illumina paired-end sequencing method. SJK covered 99.9% of the NCBI human reference genome. We identified 420,083 novel single nucleotide polymorphisms (SNPs) that are not in the dbSNP database. Despite a close similarity, significant differences were observed between the Chinese genome (YH), the only other Asian genome available, and SJK: (1) 39.87% (1,371,239 out of 3,439,107) SNPs were SJK-specific (49.51% against Venter''s, 46.94% against Watson''s, and 44.17% against the Yoruba genomes); (2) 99.5% (22,495 out of 22,605) of short indels (< 4 bp) discovered on the same loci had the same size and type as YH; and (3) 11.3% (331 out of 2920) deletion structural variants were SJK-specific. Even after attempting to map unmapped reads of SJK to unanchored NCBI scaffolds, HGSV, and available personal genomes, there were still 5.77% SJK reads that could not be mapped. All these findings indicate that the overall genetic differences among individuals from closely related ethnic groups may be significant. Hence, constructing reference genomes for minor socio-ethnic groups will be useful for massive individual genome sequencing.In 1977, the first full viral genome sequence was published (Sanger et al. 1977), and 3 yr later the same group (Anderson et al. 1981) sequenced the complete human mitochondrial genome. These early and subsequent genome projects lay the foundation for sequencing the first human genome that was completed in 2004 (International Human Genome Sequencing) (Lander et al. 2001; Venter et al. 2001; International Human Genome Sequencing Consortium 2004). Since then, we have seen astounding progress in sequencing technology which has opened a way for personal genomics (Church 2005; Shendure and Ji 2008; von Bubnoff 2008). The first personal genome (HuRef, Venter) was sequenced by the conventional Sanger dideoxy method, which is still the method of choice for de novo sequencing due to its long read-lengths of up to ∼1000 bp and per-base accuracies as high as 99.999% (Shendure and Ji 2008). Using this method, Levy et al. (2007) assembled diploid sequences with phase information that has not been performed in other genomes published. Despite limitations in read length, which is extremely important for the assembly of contigs and final genomes (Sundquist et al. 2007), it is the next-generation sequencing (NGS) technology that has made personal genomics possible by dramatically reducing the cost and increasing the efficiency (Mardis 2008; Shendure and Ji 2008). To date, at least four individual genome sequences, analyzed by NGS, have been published (Bentley et al. 2008; Ley et al. 2008; Wang et al. 2008; Wheeler et al. 2008). Using NGS for resequencing, researchers can simply map short read NGS data to known reference genomes, avoiding expensive and laborious long fragment based de novo assembly. As demonstrated by a large percentage of unmapped data in previous human genome resequencing projects, however, it should be noted that a resequenced genome may not fully reflect ethnic and individual genetic differences because its assembly is dependent on the previously sequenced genome. After the introduction of NGS, the genome sequencing bottleneck of a whole population or people is not the sequencing process itself, but the bioinformatics process of fast and accurate mapping to known data, structural variation analyses, phylogenetic analyses, association study, and application to phenotypes such as diseases.The full analysis of a human genome is far from complete, contrary to the case of phi X 174 by the Sanger group in the 1970s. For example, the NCBI human reference genome, an essential tool for resequencing genome by NGS, does not reflect an ideal picture of a human genome in terms of the number of base pairs sequenced and of genes determined. Furthermore, in a recent study, it was reported that 13% of sequence reads were not mapped to the NCBI reference genome (Wang et al. 2008). This is one of the reasons the Korean reference genome construction project was initiated. Koreans and Chinese are thought to have originated from the same ancestors and admixed for thousands of years. Comparing the two genome scale variations in relation to other already known individual genomes has given us insight about how distinct they are from each other.Here, we report SJK, the first full-length Korean individual genome sequence (SJK from Seong-Jin Kim, the genome donor), accompanied by genotype information of the donor and his mother. The SJK sequence was first released in December 2008 as the result of the Korean reference genome construction project and has been freely available at ftp://ftp.kobic.kr/pub/KOBIC-KoreanGenome/.  相似文献   

4.
ClinSeq is a pilot project to investigate the use of whole-genome sequencing as a tool for clinical research. By piloting the acquisition of large amounts of DNA sequence data from individual human subjects, we are fostering the development of hypothesis-generating approaches for performing research in genomic medicine, including the exploration of issues related to the genetic architecture of disease, implementation of genomic technology, informed consent, disclosure of genetic information, and archiving, analyzing, and displaying sequence data. In the initial phase of ClinSeq, we are enrolling roughly 1000 participants; the evaluation of each includes obtaining a detailed family and medical history, as well as a clinical evaluation. The participants are being consented broadly for research on many traits and for whole-genome sequencing. Initially, Sanger-based sequencing of 300–400 genes thought to be relevant to atherosclerosis is being performed, with the resulting data analyzed for rare, high-penetrance variants associated with specific clinical traits. The participants are also being consented to allow the contact of family members for additional studies of sequence variants to explore their potential association with specific phenotypes. Here, we present the general considerations in designing ClinSeq, preliminary results based on the generation of an initial 826 Mb of sequence data, the findings for several genes that serve as positive controls for the project, and our views about the potential implications of ClinSeq. The early experiences with ClinSeq illustrate how large-scale medical sequencing can be a practical, productive, and critical component of research in genomic medicine.Elucidating the sequence of the human genome (International Human Genome Sequencing Consortium 2001, 2004) and subsequent advances in DNA sequencing technologies (Mardis 2008) have the potential to dramatically improve the delivery of health care through the acquisition of genomic information about individual patients. However, much research will be needed to develop medical applications of genomics; for example, little is known about how to organize and implement large-scale medical sequencing (LSMS; i.e., systematic resequencing of human DNA) in a clinical context. Other approaches for applying high-throughput genomics to health care (e.g., assaying single-nucleotide polymorphisms and establishing gene-expression profiles) offer diagnostic promise; these are not further considered here, as our focus is on LSMS for studying the relationship of germline genomic variation to health and disease.We recently launched ClinSeq (http://genome.gov/20519355), a project that aims to apply LSMS within a clinical research environment to answer questions about the genetic basis of health, disease, and drug response. The application of genomic approaches (in particular LSMS) in a clinical research context is associated with a number of considerations that define key “dimensions” of any study: the number of subjects, the associated clinical data, and the breadth of genome covered (Fig. 1). Numerous detailed studies of single genes have been carried out; while often performed on many participants with significant amounts of phenotypic information, they are focused on a very small portion of the genome. The flurry of papers that describe recently generated whole-genome sequences (Levy et al. 2007; Bentley et al. 2008; Wang et al. 2008; Wheeler et al. 2008) has provided the first true individual genome sequences, including a modest amount of associated clinical data; however, the number of examples is small to date. Greater numbers are promised by the 1000 Genomes Project (http://www.1000genomes.org/), although no phenotypic information will be available for the individuals being studied. ClinSeq aims to model a more ideal study with respect to these three dimensions (Fig. 1), with the potential to further move toward the ultimate ideal as technology advances.Open in a separate windowFigure 1.A spatial conceptualization of research studies in genomic medicine. There are three key “dimensions” to consider when applying genomics to clinical research: genome breadth (the fraction of the genome that is interrogated), number of subjects or participants, and the associated clinical data about those individuals (including its depth, breadth, and rigor). While the ideal study would acquire whole-genome sequences from large numbers of extensively phenotyped subjects, this is currently impractical. Single-gene studies can involve a few or numerous subjects and extensive clinical data, but by definition involve the examination of only a single gene and thus occupy one wall of this space. The individual genomes that have recently been sequenced (Levy et al. 2007; Bentley et al. 2008; Wang et al. 2008; Wheeler et al. 2008) provide nearly complete genome breadth, but with limited clinical data; further, their limited subject numbers place them on another wall of this space. The 1000 Genomes Project (http://www.1000genomes.org/) is providing large subject numbers and extensive genome breadth, but no clinical data—positioning it on the floor of this space. ClinSeq aims to reside in the center of this space, having attributes of substantial subject size (n = 1000 initially), moderate genome breadth (∼400 genes initially, with plans for expanding this breadth), and substantial clinical data.The general aims of ClinSeq are to: (1) develop the infrastructure and approaches to acquire and analyze genome sequence from individual research participants; (2) pilot the use of LSMS to elucidate the genetic architecture underlying human traits; (3) provide an open, shared resource and environment for basic and clinical researchers to work collaboratively to perform research in genomic medicine; and (4) establish approaches for informed consent and the return of genetic information to subjects participating in LSMS studies. In pursuing these aims, our overriding goals include modeling whole-genome sequence acquisition in a manner that is practical for a clinical research setting, advancing our understanding of the genetic basis of important human diseases and traits, and establishing how to scale LSMS prior to the day when whole-genome sequencing becomes part of routine clinical practice. In this paper, we describe the ClinSeq study design, provide a snapshot of our very early data generation, and discuss the implications of this study for the nascent field of genomic medicine.  相似文献   

5.
《Research in microbiology》2014,165(10):836-840
A new mazF-based strategy for large-scale and scarless genome rearrangements in Saccharomyces cerevisiae was developed. We applied this method to delete designed internal (26.5 kbp) and terminal (28.9 kbp) regions located on the left arm of the chromosome XI of S. cerevisiae BY4741. The number of transformants was increased by one order of magnitude and about 90% of tested colonies were desired integrants using in vivo assembled deletion cassette containing longer flanking homology. Compared to conventional URA3 marker, in the counter-selection process, the new system generated 2–13 folds more colonies and the ratio of deletant was simultaneously elevated by 20–24%.  相似文献   

6.
This study presents a Web platform (http://3dfd.ujaen.es) for computing and analyzing the 3D fractal dimension (3DFD) from volumetric data in an efficient, visual and interactive way. The Web platform is specially designed for working with magnetic resonance images (MRIs) of the brain. The program estimates the 3DFD by calculating the 3D box-counting of the entire volume of the brain, and also of its 3D skeleton. All of this is done in a graphical, fast and optimized way by using novel technologies like CUDA and WebGL. The usefulness of the Web platform presented is demonstrated by its application in a case study where an analysis and characterization of groups of 3D MR images is performed for three neurodegenerative diseases: Multiple Sclerosis, Intrauterine Growth Restriction and Alzheimer’s disease. To the best of our knowledge, this is the first Web platform that allows the users to calculate, visualize, analyze and compare the 3DFD from MRI images in the cloud.  相似文献   

7.
8.
Bioinformatics is a dynamic research area in which a large number of algorithms and programs have been developed rapidly and independently without much consideration so far of the need for standardization. The lack of such common standards combined with unfriendly interfaces make it difficult for biologists to learn how to use these tools and to translate the data formats from one to another. Consequently, the construction of an integrative bioinformatics platform to facilitate biologists' research is an urgent and challenging task. KDE Bioscience is a java-based software platform that collects a variety of bioinformatics tools and provides a workflow mechanism to integrate them. Nucleotide and protein sequences from local flat files, web sites, and relational databases can be entered, annotated, and aligned. Several home-made or 3rd-party viewers are built-in to provide visualization of annotations or alignments. KDE Bioscience can also be deployed in client-server mode where simultaneous execution of the same workflow is supported for multiple users. Moreover, workflows can be published as web pages that can be executed from a web browser. The power of KDE Bioscience comes from the integrated algorithms and data sources. With its generic workflow mechanism other novel calculations and simulations can be integrated to augment the current sequence analysis functions. Because of this flexible and extensible architecture, KDE Bioscience makes an ideal integrated informatics environment for future bioinformatics or systems biology research.  相似文献   

9.
目的 建立稳定的、大规模的MICA基因第2~4外显子双向测序分型检测技术平台,并分析其单核苷酸多态性(single nucleotide polymorphism,SNP).方法 自行设计MICA基因第2~5外显子扩增引物及测序引物,探索PCR扩增及测序的最佳反应条件.用商品化的MICA基因单向测序分型试剂盒作为平行对照,对4个包含MICA* 010等位基因的样本采用自行设计引物扩增,并进行分子克隆和单倍体测序.结果 采用自行建立的MICA基因双向测序分型方法验证了100人份平行对照组单向测序分型结果.应用本研究建立的方法获得了中国人群MICA基因第2~4外显子22个SNP位点.两个新等位基因MICA* 065、MICA* 066获得了世界卫生组织的正式命名.首次发现了MICA等位基因第3内含子新的SNP位点,序列已提交IMGT/H LA数据库.结论 建立的MICA基因双向测序方法可大规模应用于中国人群的MICA基因多态性、组织器官移植配型和疾病研究.  相似文献   

10.
An interactive, minicomputer system has been constructed for analyzing dynamic phenomena recorded on movie film in a developmental biology laboratory. The minicomputer interfaces a stop-motion, variable speed projector, a digitizing pen, and real-time graphics display equipment. An analyst uses the pen to digitize features in a film, e.g. by following a cell. A computer-generated animation portraying all data entered is superimposed on the film image and synchronized with it. Noteworthy system features include: image overlays on a large screen, data entry with the projector running, large data capacity, computer control of the projector, and convenient data entry tools.  相似文献   

11.
Epileptogenesis is a dynamic process producing increased seizure susceptibility. Electroencephalography (EEG) data provides information critical in understanding the evolution of epileptiform changes throughout epileptic foci. We designed an algorithm to facilitate efficient large-scale EEG analysis via linked automation of multiple data processing steps. Using EEG recordings obtained from electrical stimulation studies, the following steps of EEG analysis were automated: (1) alignment and isolation of pre- and post-stimulation intervals, (2) generation of user-defined band frequency waveforms, (3) spike-sorting, (4) quantification of spike and burst data and (5) power spectral density analysis. This algorithm allows for quicker, more efficient EEG analysis.  相似文献   

12.
Evaluated a behavoiral treatment program for 147 obese patients in a Weight Control Clinic. Weight losses during treatment averaged 11.01 pounds with large inter-S variability. Unlike past studies, patients continued to lose weight during a 6-month follow-up period. Weight loss was associated with age and initial degree of obesity, but other demographic and psychological variables failed to predict success in treatment. A critical examination of the attrition problem was carried out to determine the relationship between patient variables and the propensity to terminate treatment prematurely. Results demonstrate the utility of bahvioral treatment procedures for obesity, yet further research is needed to reduce attrition and to facilitate long-term maintenance of weight loss.  相似文献   

13.
How many species inhabit our immediate surroundings? A straightforward collection technique suitable for answering this question is known to anyone who has ever driven a car at highway speeds. The windshield of a moving vehicle is subjected to numerous insect strikes and can be used as a collection device for representative sampling. Unfortunately the analysis of biological material collected in that manner, as with most metagenomic studies, proves to be rather demanding due to the large number of required tools and considerable computational infrastructure. In this study, we use organic matter collected by a moving vehicle to design and test a comprehensive pipeline for phylogenetic profiling of metagenomic samples that includes all steps from processing and quality control of data generated by next-generation sequencing technologies to statistical analyses and data visualization. To the best of our knowledge, this is also the first publication that features a live online supplement providing access to exact analyses and workflows used in the article.Metagenomics is often thought of as an exclusively microbial enterprise, as one of the field''s seminal papers was titled “Metagenomics: application of genomics to uncultured microorganisms” (Handelsman 2004). Because we simply do not know the number of bacterial taxa, the major motivation behind metagenomic studies was the need to estimate the biodiversity of various environments by direct sampling of potentially unculturable organisms (Beja et al. 2000, 2001; Tyson et al. 2004; Venter et al. 2004; DeLong 2005; Tringe et al. 2005; Gill et al. 2006; Poinar et al. 2006; von Mering et al. 2007). However, our understanding of eukaryotic diversity may not be much more advanced. Although the number of distinct eukaryotic (and, in particular, insect) taxa is likely far below microbial, the existing confusion about the species number is as striking. For example, Erwin (1982) obtained an estimate of 30 million insect species via extrapolation. This figure was fiercely debated, and the latest calculations converge on an educated guess on the order of 10 million (May 1988; Erwin 1991; Mayr 1998; Odegaard 2000). If we assume that these estimates are correct, then only a minute number of insect species have been described to date. For example, as of February 2009 the taxonomy database at the National Center for Biotechnology Information (NCBI) lists 318,068 species from all branches of life. In this study we apply existing metagenomic methodologies to directly determine the taxonomic composition of biological matter collected by the front end of a moving vehicle. Although our specimen collection strategy is straightforward, we set ourselves the nontrivial task of taxonomic identification of collected species. Because morphological identification is precluded by the destructive nature of the collection procedure, only DNA sequence analysis is feasible making this study de facto metagenomic.Metagenomic methodology has been evolving rapidly in the past 5 yr, and now includes a diverse array of approaches for profiling (binning) of complex samples (for excellent reviews, see McHardy and Rigoutsos 2007; Raes et al. 2007; Kunin et al. 2008; Pop and Salzberg 2008). Classification procedures make use of multiple sequence features including GC content (Foerstner et al. 2005), oligonucleotide composition (McHardy et al. 2007; McHardy and Rigoutsos 2007; Chatterji et al. 2008), and codon usage bias (Noguchi et al. 2006). Homology-based methods compare sequence reads against existing protein markers (Baldauf et al. 2000; Ludwig and Klenk 2001; Rusch et al. 2007; Wu and Eisen 2008) or genomic data (Angly et al. 2006; DeLong et al. 2006; Poinar et al. 2006; Huson et al. 2007). For our study (a eukaryotic metagenome survey), a homology-based approach is more suitable, as we do not expect compositional properties (i.e., GC content) to be informative for, say, a particular family of insects. In addition, because we expect high taxonomic complexity within our samples, the coverage of individual eukaryotic genomes will likely be small, rendering protein (gene)-based approaches useless. Hence our best chance for successful phylogenetic profiling of windshield samples is the approach used by Poinar et al. (2006) and Huson et al. (2007), which relies on the comparison of metagenomic reads against existing sequence databases.Because metagenomics is such a recent addition to life sciences, a well-designed software solution implementing the aforementioned methodologies is lacking, rendering metagenomic analyses too cumbersome for experimental biologists to perform. As an example, consider homology-based binning approaches exemplified by Poinar et al. (2006). While well-engineered systems such as CAMERA (Seshadri et al. 2007) or MG-RAST (Meyer et al. 2008) provide a powerful computational infrastructure, and visualization tools such as MEGAN (Huson et al. 2007) allow researchers to analyze the phylogenetic makeup of their metagenomic samples, metagenomic studies remain quite challenging. Indeed, homology searches (provided, e.g., by CAMERA) or taxonomic visualization are just two components of a multistep metagenomic pipeline. As an example, suppose a researcher has generated a collection of short sequencing reads from two metagenomic samples. He or she wants to identify the taxonomic representation of the reads and contrast species abundance between the two samples. The starting point of this analysis is a collection of sequencing reads and associated base quality scores. Next, the researcher would like to do the following:
  1. Evaluate the quality of sequencing reads and select high-quality segments;
  2. Search for high-scoring hits in existing databases;
  3. Assign taxonomic labels to sequencing reads based on their database matches;
  4. Visualize the taxonomic composition of metagenomic samples; and
  5. Perform a comparison of taxonomic composition between the two samples.
Although this example outlines just one of many possible metagenomic analyses, even in this simplified case only a few steps can be performed with a collection of disjoint resources. While excellent software tools for performing individual steps (CAMERA or MEGAN) exist and are freely available to the scientific community, we lack a truly integrated solution in which analyses can be easily concatenated, converted to workflows, shared among colleagues, and published in a readily reproducible form. Returning to our example analysis: Step 1 presents a great difficulty for most experimentalists as sequence and quality files are extremely large and exist in many different formats. Step 2 requires a powerful computational infrastructure that allows very large sequence data sets to be searched against even larger databases. CAMERA and MG-RAST provide a public BLAST search and annotation service, enabling large-scale comparisons against a predefined set of databases. Steps 3 and 4 can be performed with MEGAN. But because MEGAN is distributed as a standalone package, it may be challenging to process the results of large BLAST searches on a desktop computer. Step 5 may be performed with a spreadsheet application (provided no data set exceeds the upper row limit for popular spreadsheet applications). The comparison between samples will require novel statistical approaches such as tag counting (Robinson and Smyth 2007; Marioni et al. 2008) that are yet to be implemented in biologist-friendly applications.We set out to address these challenges by implementing a homology-based workflow for the analysis of metagenomic samples. As our example data set, we used sequencing reads generated by the 454 Life Sciences (Roche) FLX instrument using DNA obtained from organic matter collected by the front-end (windshield and bumper) of a moving vehicle from two geographic locations (see Methods). First, we built a complete pipeline, in which a user uploads reads generated by the sequencing machine (alternatively, reads can be obtained directly from the sequencer), performs quality control (QC), generates alignments, and conducts full taxonomic representation analysis entirely within a web browser. Because we designed our system using our existing Galaxy platform (http://galaxyproject.org), analyses described here can be easily shared among colleagues or referenced in supplementary materials to publications in a way that is completely transparent and reproducible. Next, we use the pipeline to answer the following two questions: (1) Is modern technology combined with available sequence data sufficient to identify eukaryotic taxa from low coverage random sequence samples? (2) Can “eukaryotic metagenomics” be used to contrast the species composition of distinct geographic locations?  相似文献   

14.
One approach to sequencing a large genome is (1) to sequence a collection of nonoverlapping "seeds" chosen from a genomic library of large-insert clones [such as bacterial artificial chromosomes (BACs)] and then (2) to take successive "walking" steps by selecting and sequencing minimally overlapping clones, using information such as clone-end sequences to identify the overlaps. In this paper we analyze the strategic issues involved in using this approach. We derive formulas showing how two key factors, the initial density of seed clones and the depth of the genomic library used for walking, affect the cost and time of a sequencing project-that is, the amount of redundant sequencing and the number of steps to cover the vast majority of the genome. We also discuss a variant strategy in which a second genomic library with clones having a somewhat smaller insert size is used to close gaps. This approach can dramatically decrease the amount of redundant sequencing, without affecting the rate at which the genome is covered.  相似文献   

15.
Full genome screen for Alzheimer disease: stage II analysis   总被引:23,自引:0,他引:23  
We performed a two-stage genome screen to search for novel risk factors for late-onset Alzheimer disease (AD). The first stage involved genotyping 292 affected sibling pairs using 237 markers spaced at approximately 20 cM intervals throughout the genome. In the second stage, we genotyped 451 affected sibling pairs (ASPs) with an additional 91 markers, in the 16 regions where the multipoint LOD score was greater than 1 in stage I. Ten regions maintained LOD scores in excess of 1 in stage II, on chromosomes 1 (peak B), 5, 6, 9 (peaks A and B), 10, 12, 19, 21, and X. Our strongest evidence for linkage was on chromosome 10, where we obtained a peak multipoint LOD score (MLS) of 3.9. The linked region on chromosome 10 spans approximately 44 cM from D10S1426 (59 cM) to D10S2327 (103 cM). To narrow this region, we tested for linkage disequilibrium with several of the stage II microsatellite markers. Of the seven markers we tested in family-based and case control samples, the only nominally positive association we found was with the 167 bp allele of marker D10S1217 (chi-square=7.11, P=0.045, df=1).  相似文献   

16.
IDA is a general-purpose Interactive Data Analysis tool for small to medium size data bases, suitable for very small computers, as well as larger ones. It is aimed at users with little or no computer experience. IDA combines facilities for interactive data entry and editing, selective retrieval, and data analysis and presentation. All operations are controlled by the user through a simple and highly interactive dialogue. The user can specify complex logical relationships and arithmetic operations, and can request graphs, histograms, lists, tables and simple statistics.  相似文献   

17.
18.
We performed a two‐stage genome screen to search for novel risk factors for late‐onset Alzheimer disease (AD). The first stage involved genotyping 292 affected sibling pairs using 237 markers spaced at approximately 20 cM intervals throughout the genome. In the second stage, we genotyped 451 affected sibling pairs (ASPs) with an additional 91 markers, in the 16 regions where the multipoint LOD score was greater than 1 in stage I. Ten regions maintained LOD scores in excess of 1 in stage II, on chromosomes 1 (peak B), 5, 6, 9 (peaks A and B), 10, 12, 19, 21, and X. Our strongest evidence for linkage was on chromosome 10, where we obtained a peak multipoint LOD score (MLS) of 3.9. The linked region on chromosome 10 spans approximately 44 cM from D10S1426 (59 cM) to D10S2327 (103 cM). To narrow this region, we tested for linkage disequilibrium with several of the stage II microsatellite markers. Of the seven markers we tested in family‐based and case control samples, the only nominally positive association we found was with the 167 bp allele of marker D10S1217 (chi‐square = 7.11, P = 0.045, df = 1). © 2002 Wiley‐Liss, Inc.  相似文献   

19.
The draft Fugu rubripes genome was released in 2002, at which time relatively few cDNAs were available to aid in the annotation of genes. The data presented here describe the sequencing and analysis of 24,398 expressed sequence tags (ESTs) generated from 15 different adult and juvenile Fugu tissues, 74% of which matched protein database entries. Analysis of the EST data compared with the Fugu genome data predicts that approximately 10,116 gene tags have been generated, covering almost one-third of Fugu predicted genes. This represents a remarkable economy of effort. Comparison with the Washington University zebrafish EST assemblies indicates strong conservation within fish species, but significant differences remain. This potentially represents divergence of sequence in the 5' terminal exons and UTRs between these two fish species, although clearly, complete EST data sets are not available for either species. This project provides new Fugu resources, and the analysis adds significant weight to the argument that EST programs remain an essential resource for genome exploitation and annotation. This is particularly timely with the increasing availability of draft genome sequence from different organisms and the mounting emphasis on gene function and regulation.  相似文献   

20.
There is substantial interest in implementing technologies that allow comparisons of whole genomes of individuals and of tissues and cell populations. Restriction landmark genome scanning (RLGS) is a highly resolving gel-based technique in which several thousand fragments in genomic digests are visualized simultaneously and quantitatively analyzed. The widespread use of RLGS has been hampered by difficulty in deriving sequence information for displayed fragments and a lack of whole-genome sequence-based framework for interpreting RLGS patterns. We have developed informatics tools for comparisons of sample derived RLGS patterns with patterns predicted from the human genome sequence and displayed as Virtual Genome Scans (VGS). The tools developed allow sequence prediction of fragments in RLGS patterns obtained with different restriction enzyme combinations. The utility of VGS is demonstrated by the identification of restriction fragment length polymorphisms, and of amplifications, deletions, and methylation changes in tumor-derived CpG islands and the characterization of an amplified region in a breast tumor that spanned <230 kb on 17q23.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号