首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Because of the time and cost associated with Sanger sequencing of complete human mtDNA genomes, practically all evolutionary studies have screened samples first to define haplogroups and then either selected a few samples from each haplogroup, or many samples from a particular haplogroup of interest, for complete mtDNA genome sequencing. Such biased sampling precludes many analyses of interest. Here, we used high-throughput sequencing platforms to generate, rapidly and inexpensively, 109 complete mtDNA genome sequences from random samples of individuals from three Filipino groups, including one Negrito group, the Mamanwa. We obtained on average ~55-fold coverage per sequence, with <1% missing data per sequence. Various analyses attest to the accuracy of the sequences, including comparison to sequences of the first hypervariable segment of the control region generated by Sanger sequencing; patterns of nucleotide substitution and the distribution of polymorphic sites across the genome; and the observed haplogroups. Bayesian skyline plots of population size change through time indicate similar patterns for all three Filipino groups, but sharply contrast with such plots previously constructed from biased sampling of complete mtDNA genomes, as well as with an artificially constructed sample of sequences that mimics the biased sampling. Our results clearly demonstrate that the high-throughput sequencing platforms are the methodology of choice for generating complete mtDNA genome sequences.  相似文献   

2.
Copy number variations (CNVs) affect a wide range of phenotypic traits; however, CNVs in or near segmental duplication regions are often intractable. Using a read depth approach based on next-generation sequencing, we examined genome-wide copy number differences among five taurine (three Angus, one Holstein, and one Hereford) and one indicine (Nelore) cattle. Within mapped chromosomal sequence, we identified 1265 CNV regions comprising ~55.6-Mbp sequence--476 of which (~38%) have not previously been reported. We validated this sequence-based CNV call set with array comparative genomic hybridization (aCGH), quantitative PCR (qPCR), and fluorescent in situ hybridization (FISH), achieving a validation rate of 82% and a false positive rate of 8%. We further estimated absolute copy numbers for genomic segments and annotated genes in each individual. Surveys of the top 25 most variable genes revealed that the Nelore individual had the lowest copy numbers in 13 cases (~52%, χ(2) test; P-value <0.05). In contrast, genes related to pathogen- and parasite-resistance, such as CATHL4 and ULBP17, were highly duplicated in the Nelore individual relative to the taurine cattle, while genes involved in lipid transport and metabolism, including APOL3 and FABP2, were highly duplicated in the beef breeds. These CNV regions also harbor genes like BPIFA2A (BSP30A) and WC1, suggesting that some CNVs may be associated with breed-specific differences in adaptation, health, and production traits. By providing the first individualized cattle CNV and segmental duplication maps and genome-wide gene copy number estimates, we enable future CNV studies into highly duplicated regions in the cattle genome.  相似文献   

3.
Zhao KN  Liu WJ  Frazer IH 《Virus research》2003,98(2):95-104
We analyzed the codon usage bias of eight open reading frames (ORFs) across up to 79 human papillomavirus (HPV) genotypes from three distinct phylogenetic groups. All eight ORFs across HPV genotypes show a strong codon usage bias, amongst degenerately encoded amino acids, toward 18 codons mainly with T at the 3rd position. For all 18 degenerately encoded amino acids, codon preferences amongst human and animal PV ORFs are significantly different from those averaged across mammalian genes. Across the HPV types, the L2 ORFs show the highest codon usage bias (73.2+/-1.6% and the E4 ORFs the lowest (51.1+/-0.5%), reflecting as similar bias in codon 3rd position A+T content (L2: 76.1+/-4.2%; E4: 58.6+/-4.5%). The E4 ORF, uniquely amongst the HPV ORFs, is G+C rich, while the other ORFs are A+T rich. Codon usage bias correlates positively with A+T content at the codon 3rd position in the E2, E6, L1 and L2 ORFs, but negatively in the E4 ORFs. A general conservation of preferred codon usage across human and non-human PV genotypes whether they originate from a same supergroup or not, together with observed difference between the preferred codon usage for HPV ORFs and for genes of the cells they infect, suggests that specific codon usage bias and A+T content variation may somehow increase the replicational fitness of HPVs in mammalian epithelial cells, and have practical implications for gene therapy of HPV infection.  相似文献   

4.
To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40–50 kya, followed by a constant population size, and then another expansion around 15–18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (∼400–600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ∼360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.  相似文献   

5.
Comparative genomic studies in primates have yielded important insights into the evolutionary forces that shape genetic diversity and revealed the likely genetic basis for certain species-specific adaptations. To date, however, these studies have focused on only a small number of species. For the majority of nonhuman primates, including some of the most critically endangered, genome-level data are not yet available. In this study, we have taken the first steps toward addressing this gap by sequencing RNA from the livers of multiple individuals from each of 16 mammalian species, including humans and 11 nonhuman primates. Of the nonhuman primate species, five are lemurs and two are lorisoids, for which little or no genomic data were previously available. To analyze these data, we developed a method for de novo assembly and alignment of orthologous gene sequences across species. We assembled an average of 5721 gene sequences per species and characterized diversity and divergence of both gene sequences and gene expression levels. We identified patterns of variation that are consistent with the action of positive or directional selection, including an 18-fold enrichment of peroxisomal genes among genes whose regulation likely evolved under directional selection in the ancestral primate lineage. Importantly, we found no relationship between genetic diversity and endangered status, with the two most endangered species in our study, the black and white ruffed lemur and the Coquerel's sifaka, having the highest genetic diversity among all primates. Our observations imply that many endangered lemur populations still harbor considerable genetic variation. Timely efforts to conserve these species alongside their habitats have, therefore, strong potential to achieve long-term success.  相似文献   

6.
7.
Comparative genomic hybridization (CGH) allows the detection of DNA sequence copy number changes on a genome-wide scale in a single hybridization reaction. The ability of CGH to be applied to formalin-fixed, paraffin-embedded tumor samples has lead to its widespread application in the cytogenetic analysis of archival material. When setting up CGH in the laboratory, rigorous control experiments must be carried out to ensure that the losses and gains are scored correctly. Groups interested in breast cancer frequently use the MCF-7 cell line as a positive control in these experiments, comparing the results to previously described genetic alterations. Here we present the results of CGH carried out with three stocks of MCF-7 cells. The cells differ widely in their proliferative response to 17-beta estradiol and show extensive variation in copy number changes affecting specific chromosomal regions. We suggest that care must be taken, therefore, when choosing a cell line as a positive control for CGH experiments.  相似文献   

8.
There is marked diurnal variation in the glycogen content of skeletal muscles of animals, but few studies have addressed such variations in human muscles. 13C MRS can be used to noninvasively measure the glycogen content of human skeletal muscle, but no study has explored the diurnal variations in this parameter. This study aimed to investigate whether a diurnal variation in glycogen content occurs in human muscles and, if so, to what extent it can be identified using 13C MRS. Six male volunteers were instructed to maintain their normal diet and not to perform strenuous exercise for at least 3 days before and during the experiment. Muscle glycogen and blood glucose concentrations were measured six times in 24 h under normal conditions in these subjects. The glycogen content in the thigh muscle was determined noninvasively by natural abundance 13C MRS using a clinical MR system at 3 T. Nutritional analysis revealed that the subjects' mean carbohydrate intake was 463 ± 137 g, being approximately 6.8 ± 2.4 g/kg body weight. The average sleeping time was 5.9 ± 1.0 h. The glycogen content in the thigh muscle at the starting point was 64.8 ± 20.6 mM. Although absolute and relative individual variations in muscle glycogen content were 7.0 ± 2.1 mM and 11.3 ± 4.6%, respectively, no significant difference in glycogen content was observed among the different time points. This study demonstrates that normal food intake (not fat and/or carbohydrate rich), sleep and other daily activities have a negligible influence on thigh muscle glycogen content, and that the diurnal variation of the glycogen content in human muscles is markedly smaller than that in animal muscles. Moreover, the present results also support the reproducibility and availability of 13C MRS for the evaluation of the glycogen content in human muscles. Copyright © 2015 John Wiley & Sons, Ltd.  相似文献   

9.
The contribution to genetic diversity of genomic segmental copy number variations (CNVs) is less well understood than that of single-nucleotide polymorphisms (SNPs). While less frequent than SNPs, CNVs have greater potential to affect phenotype. In this study, we have performed the most comprehensive survey to date of CNVs in mice, analyzing the genomes of 42 Mouse Phenome Consortium priority strains. This microarray comparative genomic hybridization (CGH)-based analysis has identified 2094 putative CNVs, with an average of 10 Mb of DNA in 51 CNVs when individual mouse strains were compared to the reference strain C57BL/6J. This amount of variation results in gene content that can differ by hundreds of genes between strains. These genes include members of large families such as the major histocompatibility and pheromone receptor genes, but there are also many singleton genes including genes with expected phenotypic consequences from their deletion or amplification. Using a whole-genome association analysis, we demonstrate that complex multigenic phenotypes, such as food intake, can be associated with specific copy number changes.  相似文献   

10.
Ductular reaction (DR) represents the activation of hepatic progenitor cells (HPCs) and has been associated with features of advanced chronic liver disease; yet it is not clear whether these cells contribute to disease progression and how the composition of their micro-environment differs depending on the aetiology. This study aimed to identify HPC-associated signalling pathways relevant in different chronic liver diseases using a high-throughput sequencing approach. DR/HPCs were isolated using laser microdissection from patient samples diagnosed with HCV or primary sclerosing cholangitis (PSC), as models for hepatocellular or biliary regeneration. Key signals were validated at the protein level for a cohort of 56 patients (20 early and 36 advanced stage). In total, 330 genes were significantly differentially expressed between the HPCs in HCV and PSC. Recruitment and homing of inflammatory cells were distinctly different depending on the aetiology. HPCs in PSC were characterised by a response to oxidative stress (e.g. JUN, VNN1) and neutrophil-attractant chemokines (CXCL5, CXCL6, IL-8), whereas HPCs in HCV were identified by T- and B-lymphocyte infiltration. Moreover, we found that communication between HPCs and macrophages was aetiology driven. In PSC, a high frequency of CCL28-positive macrophages was observed in the portal infiltrate, already in early disease in the absence of advanced fibrosis, while in HCV, HPCs showed a strong expression of the macrophage scavenger receptor MARCO. Interestingly, DR/HPCs in PSC showed more deposition of ECM (e.g. FN1, LAMC2, collagens) compared to HCV, where an increase of pro-invasive genes (e.g. PDGFRA, IGF2) was observed. Additionally, endothelial cells in the vicinity of DR/HPCs showed differential immunopositivity (e.g. IGF2 and INHBA expression). In conclusion, our data shine light on the role of DR/HPCs in immune signalling, fibrogenesis and angiogenesis in chronic liver disease. Copyright © 2018 Pathological Society of Great Britain and Ireland. Published by John Wiley & Sons, Ltd.  相似文献   

11.
Utilizing the full power of next-generation sequencing often requires the ability to perform large-scale multiplex enrichment of many specific genomic loci in multiple samples. Several technologies have been recently developed but await substantial improvements. We report the 10,000-fold improvement of a previously developed padlock-based approach, and apply the assay to identifying genetic variations in hypermutable CpG regions across human chromosome 21. From ∼3 million reads derived from a single Illumina Genome Analyzer lane, ∼94% (∼50,500) target sites can be observed with at least one read. The uniformity of coverage was also greatly improved; up to 93% and 57% of all targets fell within a 100- and 10-fold coverage range, respectively. Alleles at >400,000 target base positions were determined across six subjects and examined for single nucleotide polymorphisms (SNPs), and the concordance with independently obtained genotypes was 98.4%–100%. We detected >500 SNPs not currently in dbSNP, 362 of which were in targeted CpG locations. Transitions in CpG sites were at least 13.7 times more abundant than non-CpG transitions. Fractions of polymorphic CpG sites are lower in CpG-rich regions and show higher correlation with human–chimpanzee divergence within CpG versus non-CpG sites. This is consistent with the hypothesis that methylation rate heterogeneity along chromosomes contributes to mutation rate variation in humans. Our success suggests that targeted CpG resequencing is an efficient way to identify common and rare genetic variations. In addition, the significantly improved padlock capture technology can be readily applied to other projects that require multiplex sample preparation.For decades, DNA sequencing has been pivotal in understanding biology, yielding over 900 whole-genome sequences, and identifying genetic variations and somatic mutations that underlie human diseases (Frazer et al. 2007; Stenson et al. 2008). Recent sequencing-based studies suggest that a large panel of genes is mutated in various cancers (Sjoblom et al. 2006; Jones et al. 2008; Parsons et al. 2008). Individually rare but cumulatively frequent variations contribute to the inheritance of common multifactorial diseases (Cohen et al. 2004, 2006; Bodmer and Bonilla 2008; Ji et al. 2008). Recently, “deep sequencing” has been enabled by “next-generation” technologies that reduce sequencing costs by several orders of magnitude (Shendure and Ji 2008). However, it is still prohibitively expensive to sequence whole human genomes, particularly when sample sizes are large. Thus, multiplexed targeted amplification of many genomic regions of interest is crucial for rapid and cost-effective sequencing-based research projects.Parallel targeted amplification of selected genome regions is a challenging task (Garber 2008). Two different categories of methods have been developed to enrich or capture desired genomic regions, such as exons. One category employs hybridization of sheared genomic DNA to probes complementary to targeted regions. The probes can be oligonucleotides on a microarray surface (Albert et al. 2007; Hodges et al. 2007; Okou et al. 2007) or in solution (Gnirke et al. 2009). Although most of the desired regions are captured, the specificity of enriched genomic DNA tends to be limited due to “off target” and “near target” capture. In addition, due to the low efficiency of hybridization on the surface of a microarray, large amounts of genomic DNA are needed. The other category of methods requires hybridization in regions flanking both sides of the target and subsequent circularization of the targets. One way is to use “selector” oligonucleotides to guide the directed circularization of the target sequences digested with restriction enzymes (Dahl et al. 2005, 2007). Another method, which is independent of the presence of flanking restriction enzyme sites and thus is more flexible, applies padlock (molecular inversion) probes that anchor targeted regions and are circularized after polymerization and ligation (Hardenbol et al. 2003; Porreca et al. 2007). In our initial study, we targeted 55,000 exons but observed only ∼10,000 unique sites in over 2 million end-sequencing reads, and most heterozygous loci were called incorrectly (Porreca et al. 2007), indicating the need for substantial improvement in capturing efficiency.In the human genome, CpG dinucleotides are about fivefold less abundant than expected by chance (Sved and Bird 1990). This is due to the widespread methylation of cytosine in CpG and the deamination of 5-methylcytosine to thymidine (Wang et al. 1982); CpG is thus frequently mutated to TpG (or CpA on the complementary DNA strand). Overall, CpG elevates the mutation rate for transitions by 14- to 15-fold and for transversions by three- to fourfold (Kondrashov 2003; Hwang and Green 2004; Schmidt et al. 2008).Mutations within the CpG context are a predominant cause of human diseases. At least one-third of mutations implicated in Mendelian diseases originated within CpG contexts (Cooper and Youssoufian 1988; Cooper and Krawczak 1993). Although CpG sites are depleted in the bulk of noncoding human DNA, they are selectively maintained in protein-coding genes and other functional genomic regions despite the elevated mutation rate (Subramanian and Kumar 2003; Kondrashov et al. 2006). Thus, the prevalence of CpG-induced mutations among disease mutations is much greater than among all mutations.CpG context plays an important role in somatic mutations involved in human cancer. In the TP53 gene, which is mutated in >50% of all human tumors, ∼30% of all mutations occur at CpG dinucleotides, and all five major mutation hotspots are found at CpGs (Olivier et al. 2002). Recently, sequencing of nearly all protein-coding regions in cancer genomes revealed that 17%, 38%, 43%, and 48% of point mutations occur at CpGs in breast, pancreatic, brain, and colorectal cancers, respectively (Sjoblom et al. 2006; Jones et al. 2008; Parsons et al. 2008).In this work, we describe improvements to our padlock capturing technology that yield an estimated 10,000-fold increase in capture efficiency over previous reports and significant improvements in sensitivity and uniformity. We designed 53,777 padlock probes to cover ∼24% of all CpGs on human chromosome 21, where each probe captured a 40-bp region containing at least one CpG, and applied them to discover genetic variants in the genomic DNA from one HapMap CEPH individual (NA10835) and five volunteers from the Personal Genome Project (PGP; http://www.personalgenomes.org). We report the improved performance and high reproducibility of our optimized methods, and demonstrate the utility of the data for identification of known and novel SNPs and unbiased analysis of CpG variation rates.  相似文献   

12.
13.
Less than 1% of known monilophytes and lycophytes have a genome size estimate, and substantially less is known about the presence and prevalence of endopolyploid nuclei in these groups. Thirty-one monilophyte species (including three horsetails) and six lycophyte species were collected in Ontario, Canada. Using flow cytometry, genome size and degree of endopolyploidy were estimated for 37 species. Across the five orders covered, 1Cx-values averaged 4.2 pg in the Lycopodiales, 18.1 pg for the Equisetales, 5.06 pg for a single representative of the Ophioglossales, 14.3 pg for the Osmundales, and 7.06 pg for the Polypodiales. There was no indication of endoreduplication in any of the leaf, stem, or root tissue analyzed. This information is essential to our understanding of DNA content evolution in land plants.  相似文献   

14.
《Immunity》2023,56(6):1376-1392.e8
  1. Download : Download high-res image (227KB)
  2. Download : Download full-size image
  相似文献   

15.
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.The development and commercialization of next-generation massively parallel DNA sequencing technologies, including Illumina Genome Analyzer (GA) (Bentley 2006), Applied Biosystems SOLiD System, and Helicos BioSciences HeliScope (Harris et al. 2008), have revolutionized genomic research. Compared to traditional Sanger capillary-based electrophoresis systems, these new technologies provide ultrahigh throughput with two orders of magnitude lower unit data cost. However, they all share a common intrinsic characteristic of providing very short read length, currently 25–75 base pairs (bp), which is substantially shorter than the Sanger sequencing reads (500–1000 bp) (Shendure et al. 2004). This has raised concern about their ability to accurately assemble large genomes. Illumina GA technology has been shown to be feasible for use in human whole-genome resequencing and can be used to identify single nucleotide polymorphisms (SNPs) accurately by mapping the short reads onto the known reference genome (Bentley et al. 2008; Wang et al. 2008). But to thoroughly annotate insertions, deletions, and structural variations, de novo assembly of each individual genome from these raw short reads is required.Currently, Sanger sequencing technology remains the dominant method for building a reference genome sequence for a species. It is, however, expensive, and this prevents many genome sequencing projects from being put into practice. Over the past 10 yr, only a limited number of plant and animal genomes have been completely sequenced, (http://www.ncbi.nlm.nih.gov/genomes/static/gpstat.html), including human (Lander et al. 2001; Venter et al. 2001) and mouse (Mouse Genome Sequencing Consortium 2002), but accurate understanding of evolutionary history and biological processes at a nucleotide level requires substantially more. The development of a de novo short read assembly method would allow the building of reference sequences for these unexplored genomes in a very cost-effective way, opening the door for carrying out numerous substantial new analyses.Several programs, such as phrap (http://www.phrap.org), Celera assembler (Myers et al. 2000), ARACHNE (Batzoglou et al. 2002), Phusion (Mullikin and Ning 2003), RePS (Wang et al. 2002), PCAP (Huang et al. 2003), and Atlas (Havlak et al. 2004), have been successfully used for de novo assembly of whole-genome shotgun (WGS) sequencing reads in the projects applying the Sanger technology. These are based on an overlap-layout strategy, but for very short reads, this approach is unsuitable because it is hard to distinguish correct assembly from repetitive sequence overlap due to there being only a very short sequence overlap between these short reads. Also, in practice, it is unrealistic to record into a computer memory all the sequence overlap information from deep sequencing that are made up of huge numbers of short reads.The de Bruijn graph data structure, introduced in the EULER (Pevzner et al. 2001) assembler, is particularly suitable for representing the short read overlap relationship. The advantage of the data structure is that it uses K-mer as vertex, and read path along the K-mers as edges on the graph. Hence, the graph size is determined by the genome size and repeat content of the sequenced sample, and in principle, will not be affected by the high redundancy of deep read coverage. A few short read assemblers, including Velvet (Zerbino and Birney 2008), ALLPATHS (Butler et al. 2008), and EULER-SR (Chaisson and Pevzner 2008), have adopted this algorithm, explicitly or implicitly, and have been implemented and shown very promising performances. Some other short read assemblers have applied the overlap and extension strategy, such as SSAKE (Warren et al. 2007), VCAKE (Jeck et al. 2007) (the follower of SSAKE which can handle sequencing errors), SHARCGS (Dohm et al. 2007), and Edena (Hernandez et al. 2008). However, all these assemblers were designed to handle bacteria- or fungi-sized genomes, and cannot be applied for assembly of large genomes, such as the human, given the limits of the available memory of current supercomputers. Recently, ABySS (Simpson et al. 2009) used a distributed de Bruijn graph algorithm that can split data and parallelize the job on a Linux cluster with message passing interface (MPI) protocol, allowing communication between nodes. Thus, it is able to handle a whole short read data set of a human individual; however, the assembly is very fragmented with an N50 length of ∼1.5 kilobases (kb). This is not long enough for structural variation detection between human individuals, nor is it good enough for gene annotation and further analysis of the genomes of novel species.Here, we present a novel short read assembly method that can build a de novo draft assembly for the human genome. We previously sequenced the complete genome of an Asian individual using a resequencing method, producing a total of 117.7 gigabytes (Gb) of data, and have now an additional 82.5 Gb of paired-end short reads, achieving a 71× sequencing depth of the NCBI human reference sequence. We used this substantial amount of data to test our de novo assembly method, as well as the data from the African genome sequence (Bentley et al. 2008; Wang et al. 2008; Li et al. 2009a). We compared the de novo assemblies to the NCBI reference genome and demonstrated the capability of this method to accurately identify structural variations, especially small deletions and insertions that are difficult to detect using the resequencing method. This software has been integrated into the short oligonucleotide alignment program (SOAP) (Li et al. 2008, 2009b,c) package and named SOAPdenovo to indicate its functionality.  相似文献   

16.
17.
There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in human evolution and genomic variation, they have yet to be studied collectively and in detail, partially owing to their large size, variability, and predominant location in noncoding regions. Here, we examine 467 VNTRs that are human-specific expansions, unique to one location in the genome, and not associated with retrotransposons. We leverage publicly available long-read genomes, including from the Human Genome Structural Variant Consortium, to ascertain the exact nucleotide composition of these VNTRs and compare their composition of alleles. We then confirm repeat unit composition in more than 3000 short-read samples from the 1000 Genomes Project. Our analysis reveals that these VNTRs contain highly structured repeat motif organization, modified by frequent deletion and duplication events. Although overall VNTR compositions tend to remain similar between 1000 Genomes Project superpopulations, we describe a notable exception with substantial differences in repeat composition (in PCBP3), as well as several VNTRs that are significantly different in length between superpopulations (in ART1, PROP1, DYNC2I1, and LOC102723906). We also observe that most of these VNTRs are expanded in archaic human genomes, yet remain stable in length between single generations. Collectively, our findings indicate that repeat motif variability, repeat composition, and repeat length are all informative modalities to consider when characterizing VNTRs and their contribution to genomic variation.

There are tens of thousands of variable number tandem repeats (VNTRs) in the human genome (Näslund et al. 2005), yet as a whole they remain uncharacterized. These VNTRs—that is, repeats with a repeat unit of seven base pairs (bp) or greater—are often too large or variable to be effectively captured using the short-read sequencing technologies typically used for whole-genome sequencing. In addition, they are frequently located in noncoding or intergenic regions, which until recently have garnered less attention than genomic variants in coding regions. VNTRs, however, are highly mutable, suggesting that they play influential roles in evolutionary biology, and along with short tandem repeats (STRs; repeats with a repeat unit of <7 bp), are a major source of human genetic diversity (Jeffreys et al. 1985; Berg et al. 2010; Hannan 2018). The recent advent of long-read sequencing has revealed that many VNTRs are much larger than the reference human genome suggests, and far more polymorphic. The few VNTRs that have been studied recently in more detail have provided insights into evolutionary history, replication mechanism, population structure, and disease. Characterizing more VNTRs at this higher resolution will continue to expand these insights.Four VNTRs have recently been studied in detail, primarily owing to their involvement in disease. A 25-bp VNTR in the intron of ATP binding cassette subfamily A member 7 (ABCA7) influences alternative splicing and is associated with Alzheimer''s disease (De Roeck et al. 2018). A 30-bp VNTR in calcium voltage-gated channel subunit alpha1 C (CACNA1C) shows varying repeat unit arrays correlated with “protective” or “risk” alleles in schizophrenia and bipolar disorder (Song et al. 2018). A 33-bp VNTR in the promoter of tribbles pseudokinase 3 (TRIB3) is associated with TRIB3 expression, and copy number of the repeat is correlated with certain disease-associated single-nucleotide polymorphisms (SNPs) (Örd et al. 2020). We also identified a 69-bp repeat in WD repeat domain 7 (WDR7) associated with ALS (Course et al. 2020). A closer look at long-read sequenced genomes from geographically diverse samples indicated that this particular repeat expands via duplication events and a replication error called template switching. Furthermore, a small number of repeat units were unique to certain superpopulations and were also found in short-read data sets of ancient genomes. While examining this VNTR along with several others, we recognized that performing a similar analysis on a larger number of VNTRs could illuminate how these repeats vary and the mutational processes that have shaped them.As these examples show, the VNTRs studied in detail have so far been studied one at a time and for a particular reason, like association with disease. Instead of continuing to study these repeats one-by-one, we decided to study a subset of them methodically and as a group. Doing so could answer questions about these VNTRs as a category of genomic variant, like their general characteristics as well as timing and patterns of expansion. Here, we look at a set of 467 VNTRs chosen for the following characteristics: they show human-specific expansion, they are not associated with retrotransposons, and they are unique to one location in the genome. These parameters select the VNTRs most likely to have expanded recently—and that may still be expanding—so we can observe their changes in different populations more readily. Furthermore, expansion of the same genomic segment in multiple places in the genome would be unlikely unless there were a retrotransposon driving it, so these parameters select for VNTRs that have expanded via other mechanisms. We then assess these or a subset of these VNTRs in ancestral human genomes as well as modern human genomes from the 1000 Genomes Project. We observe the similarities and differences of VNTRs in various superpopulations and their timescale of expansion. Finally, by taking a closer look at these genomes in long-read sequenced samples, we define several modalities of internal nucleotide pattering, which provides a useful framework for future VNTR analysis.  相似文献   

18.
19.
20.
Conservation and variation in orf virus genomes   总被引:5,自引:0,他引:5  
The genomes of several orf virus strains were analyzed with the restriction endonucleases EcoRI, HindIII, BamHI, and KpnI, and cleavage site maps were deduced. In general, the right half of the genome showed conservation of restriction sites compared with the left half. Variations in size of up to 0.5 kbp were found within an inverted terminal repetition, and a 1-kbp deletion was detected in some strains in a subterminal fragment at the left end. A region of approximately 20 kbp, some 12 kbp in from the left end, showed the greatest cleavage site variability although there was no evidence of large deletions in this region. A 1.55-kbp cloned DNA fragment from the internal variable region of NZ2 failed to hybridize to the DNA from three other strains. A fragment in the variable region of strain NZ7 was cloned and compared by hybridization and restriction endonuclease analysis with cloned NZ2 fragments from the same region. The region of nonhomology extended for at least 2.75 kbp. It is suggested that this internal variable region may provide sites for the insertion of foreign genes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号