The enormous genetic diversity and mutability of HIV has prevented effective control of this virus by natural immune responses or vaccination. Evolution of the circulating HIV population has thus occurred in response to diverse, ultimately ineffective, immune selection pressures that randomly change from host to host. We show that the interplay between the diversity of human immune responses and the ways that HIV mutates to evade them results in distinct sets of sequences defined by similar collectively coupled mutations. Scaling laws that relate these sets of sequences resemble those observed in linguistics and other branches of inquiry, and dynamics reminiscent of neural networks are observed. Like neural networks that store memories of past stimulation, the circulating HIV population stores memories of host–pathogen combat won by the virus. We describe an exactly solvable model that captures the main qualitative features of the sets of sequences and a simple mechanistic model for the origin of the observed scaling laws. Our results define collective mutational pathways used by HIV to evade human immune responses, which could guide vaccine design.Viruses can infect humans to cause infectious diseases, which, on occasion, lead to outbreaks that reach pandemic proportions resulting in millions of deaths. One prominent example of such a virus is HIV. Vaccination, a procedure that aims to protect humans from infectious pathogens, is one of the greatest triumphs of modern medicine. Vaccines induce human immune responses that are specific for a pathogen, which then lie ready and waiting to abort infection. However, no effective vaccine for HIV exists, and there is no known example of HIV being cleared by natural human immune responses. This is because of the extraordinarily high mutability of the virus and its ability to rapidly down-regulate the human immune system (
1,
2). The high mutability enables HIV to evade natural or vaccine-induced immune responses (
2,
3), while down-regulation of the host immune system hinders the development of potent responses (
2). This is in contrast to many other viruses that can often be cleared by effective natural responses and vaccinated against successfully (
4). These viruses accumulate mutations in a directed fashion, guided by selective pressure due to successful vaccine-induced or natural immune responses (
1,
5). The lack of effective natural immune responses or successful vaccines, and the enormous diversity of human immune pressures [e.g., T-cell responses (
6)], implies that HIV has evolved in the human population in response to myriad, usually ineffective, immune responses. We set out to study the properties of such a virus population.In past and current work (
7–
9), we have tried to define the functional constraints on HIV evolution with the practical goal of identifying its mutational vulnerabilities, and then harnessing this knowledge to inform vaccine design. Toward this end, we analyzed sequences of HIV proteins derived from virus samples extracted from diverse patients. Following a statistical approach pioneered in the study of neuronal networks (
10,
11), we inferred a model for the probability of occurrence of mutant strains (a “prevalence landscape”) by maximizing the entropy of this inferred probability distribution subject to the constraints of reproducing the observed frequency of single and double mutations in the sequence data (
8,
9). This model also accurately reproduces higher-order statistics characterizing the sequence data, such as the probability of observing sequences with a certain number of mutations, even though these quantities are not directly constrained in the inference procedure (
Fig. S1 and
SI Text). Theoretical studies suggest that, for HIV strains that are phylogenetically relatively close, the rank order of the prevalence of strains is the same as the rank order of their intrinsic replicative fitness (
12). This may seem surprising because the viral sequences used to infer our model are samples obtained from patients during the course of nonequilibrium host–pathogen combat, and so the effective in-host fitness of a viral strain can be different from its intrinsic fitness. Although immune responses drive sequence evolution in each patient, they are a perturbative effect at the population level, making the rank order of prevalence and fitness statistically similar (
12). This is because of the great diversity of human immune responses directed toward different regions of the viral proteome, and deleterious mutations made to evade the immune response in one host tend to revert upon transmission to another host (
13). In vitro and in vivo studies testing our predictions for fitness support this conclusion (
8,
9).The maximum entropy model for the prevalence/fitness is described by the following:
[1]where
P(
z) is the probability of observing a sequence of amino acids
z = {
z1,
z2,
…,
zN}, with
N the total length of the protein sequence. Amino acids at each site
i are identified as either consensus (
zi = 0) or mutant (
zi = 1). Here, the partition function
Q ensures that the probabilities of all sequences sum to 1. The fields,
hi, and couplings,
Jij, in the Hamiltonian are obtained by fitting the observed probabilities of single and double mutations in sequences of HIV proteins (
Methods). A positive coupling between a pair of sites implies that sequences with both sites mutated are observed more often than would be expected if mutations at these sites were independent. Thus, positive couplings indicate potentially synergistic or compensatory interactions between mutations. Mutations of both sites in a negatively coupled pair are observed less often than would be expected if the sites were independent, indicating a potential antagonistic or deleterious interaction between mutations at these sites. Similarly, point mutations are observed more often at sites with positive fields than at those with negative fields, when interactions with other sites in the sequence background are neglected. (Although related to epistasis, we emphasize that the overall effect of a particular mutation on fitness must be considered in the context of a particular sequence background: for example, mutation at a site
i where the field
hi is positive may nonetheless lead to a decrease in viral fitness if there exist significant negative couplings between site
i and other mutated sites in the sequence background.) For clarity and consistency, we use the language of fitness to describe the results presented below.
相似文献