A novel application of pattern recognition for accurate SNP and indel discovery from high-throughput data: targeted resequencing of the glucocorticoid receptor co-chaperone FKBP5 in a Caucasian population |
| |
Authors: | Pelleymounter Linda L Moon Irene Johnson Julie A Laederach Alain Halvorsen Matt Eckloff Bruce Abo Ryan Rossetti Sandro |
| |
Affiliation: | a Department of Pharmacology, Department of Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, USAb Division of Mayo Clinic Research and Education Support Systems, Department of Bioinformatic Systems, Mayo Clinic, Rochester, MN, USAc Biology Department, University of North Carolina, Chapel Hill, NC, USAd Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, USAe Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN, USA |
| |
Abstract: | The detection of single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) with precision from high-throughput data remains a significant bioinformatics challenge. Accurate detection is necessary before next-generation sequencing can routinely be used in the clinic. In research, scientific advances are inhibited by gaps in data, exemplified by the underrepresented discovery of rare variants, variants in non-coding regions and indels. The continued presence of false positives and false negatives prevents full automation and requires additional manual verification steps. Our methodology presents applications of both pattern recognition and sensitivity analysis to eliminate false positives and aid in the detection of SNP/indel loci and genotypes from high-throughput data. We chose FK506-binding protein 51(FKBP5) (6p21.31) for our clinical target because of its role in modulating pharmacological responses to physiological and synthetic glucocorticoids and because of the complexity of the genomic region. We detected genetic variation across a 160 kb region encompassing FKBP5. 613 SNPs and 57 indels, including a 3.3 kb deletion were discovered. We validated our method using three independent data sets and, with Sanger sequencing and Affymetrix and Illumina microarrays, achieved 99% concordance. Furthermore we were able to detect 267 novel rare variants and assess linkage disequilibrium. Our results showed both a sensitivity and specificity of 98%, indicating near perfect classification between true and false variants. The process is scalable and amenable to automation, with the downstream filters taking only 1.5 h to analyze 96 individuals simultaneously. We provide examples of how our level of precision uncovered the interactions of multiple loci, their predicted influences on mRNA stability, perturbations of the hsp90 binding site, and individual variation in FKBP5 expression. Finally we show how our discovery of rare variants may change current conceptions of evolution at this locus. |
| |
Keywords: | Pattern recognition Next-generation sequencing analysis Indels Rare variants FKBP5 HLA |
本文献已被 ScienceDirect PubMed 等数据库收录! |
|