首页 | 本学科首页   官方微博 | 高级检索  
     


Quantifying selection in immune receptor repertoires
Authors:Yuval Elhanati  Anand Murugan  Curtis G. Callan  Jr.   Thierry Mora  Aleksandra M. Walczak
Affiliation:aLaboratoire de Physique Théorique, Unité Mixte de Recherche 8549 and;dLaboratoire de Physique Statistique, Unité Mixte de Recherche 8550, Centre National de la Recherche Scientifique and École Normale Supérieure, 75005 Paris, France;;bDepartment of Applied Physics, Stanford University, Stanford, CA, 94305; and;cJoseph Henry Laboratories, Princeton University, Princeton, NJ, 08544
Abstract:The efficient recognition of pathogens by the adaptive immune system relies on the diversity of receptors displayed at the surface of immune cells. T-cell receptor diversity results from an initial random DNA editing process, called VDJ recombination, followed by functional selection of cells according to the interaction of their surface receptors with self and foreign antigenic peptides. Using high-throughput sequence data from the β-chain of human T-cell receptors, we infer factors that quantify the overall effect of selection on the elements of receptor sequence composition: the V and J gene choice and the length and amino acid composition of the variable region. We find a significant correlation between biases induced by VDJ recombination and our inferred selection factors together with a reduction of diversity during selection. Both effects suggest that natural selection acting on the recombination process has anticipated the selection pressures experienced during somatic evolution. The inferred selection factors differ little between donors or between naive and memory repertoires. The number of sequences shared between donors is well-predicted by our model, indicating a stochastic origin of such public sequences. Our approach is based on a probabilistic maximum likelihood method, which is necessary to disentangle the effects of selection from biases inherent in the recombination process.The T-cell response of the adaptive immune system begins when receptor proteins on the surface of these cells recognize a pathogen peptide displayed by an antigen-presenting cell. The immune cell repertoire of a given individual is comprised of many clones, each with a distinct surface receptor. This diversity, which is central to the ability of the immune system to defeat pathogens, is initially created by a stochastic process of germline DNA editing (called VDJ recombination) that gives each new immune cell a unique surface receptor gene. This initial repertoire is subsequently modified by selective forces, including nonpathogen-related thymic selection against excessive (or insufficient) recognition of self proteins, which are also stochastic in nature. Because of this stochasticity and the large T-cell diversity, these repertoires are best described by probability distributions. In this paper, we apply a probabilistic approach to sequence data to obtain quantitative measures of the overall (not necessarily pathogenic) selection pressures that shape T-cell receptor repertoires.New receptor genes are formed by randomly choosing alleles from a set of genomic templates for the subregions (V, D, and J) of the complete gene. Insertion and deletion of nucleotides in the junctional regions between the V and D and D and J genes greatly enhance diversity beyond pure VDJ combinatorics (1). The most variable region of the gene is between the last amino acids of the V segment and the beginning of the J segment; it codes for the Complementarity Determining Region 3 (CDR3) loop of the receptor protein, a region known to be functionally important in recognition (2). Previous studies have shown that immune cell receptors are not uniform in terms of VDJ gene segment use (36) or probability of generation (1) and that certain receptors are more likely than others to be shared by different individuals (4, 7). The statistical properties of the immune repertoire are, thus, rather complex, and their accurate determination requires sophisticated methods.Recent advances in sequencing technology have made it possible to sample the T-cell receptor diversity of individual subjects in great depth (8). The availability of such data has, in turn, led to the development of sequence statistics-based approaches to the study of immune cell diversity (9, 10). In particular, we recently quantitatively characterized the preselection diversity of the human T-cell repertoire by learning the probabilistic rules of VDJ recombination from out-of-frame DNA sequences that cannot be subject to functional selection and whose statistics therefore reflect only the recombination process (1). After generation, T cells undergo a somatic selection process in the thymus (11) and later in the periphery (12). Cells that pass thymic selection enter the peripheral repertoire as naive T cells, and the subset of naive cells that eventually engage in an immune response will survive as a long-lived memory pool. Although we now understand the statistical properties of the initial repertoire of immune receptors (1) and despite some theoretical studies of thymic selection at the molecular level (13, 14), a quantitative understanding of how selection modifies those statistics to produce the naive and memory repertoires is lacking.In this paper, we build on our understanding of the preselection distribution of T-cell receptors to derive a statistical method for identifying and quantifying selection pressures in the adaptive immune system. We apply this method to naive and memory DNA sequences of human T-cell β-chains obtained from peripheral blood samples of nine healthy individuals. Our goal is to characterize the likelihood that any given sequence, after it is generated, will survive selection for the ensemble of properties needed to pass into the peripheral repertoire(s). Our analysis reveals strong and reproducible signatures of selection on specific amino acids in the CDR3 sequence and on the usage of V and J genes. Most strikingly, we find significant correlation between the generation probability of a sequence and the probability that it will pass selection. This correlation suggests that natural selection, which acts on very long timescales to shape the generation mechanism itself, may have tuned it to anticipate somatic selection, which acts on single cells throughout the lifetime of an individual. The quantitative features of selection inferred from our model vary very little between donors, indicating that these features are universal. In addition, our measures of selection pressure on the memory and naive repertoires are statistically indistinguishable, consistent with the hypothesis that the memory pool is a random subsample of the naive pool.
Keywords:thymic selection   statistical inference   public repertoire   T cell
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号