Abstract: | We report an analysis of more than 240,000 loci genotyped using the Affymetrix SNP microarray in 554 individuals from 27 worldwide populations in Africa, Asia, and Europe. To provide a more extensive and complete sampling of human genetic variation, we have included caste and tribal samples from two states in South India, Daghestanis from eastern Europe, and the Iban from Malaysia. Consistent with observations made by Charles Darwin, our results highlight shared variation among human populations and demonstrate that much genetic variation is geographically continuous. At the same time, principal components analyses reveal discernible genetic differentiation among almost all identified populations in our sample, and in most cases, individuals can be clearly assigned to defined populations on the basis of SNP genotypes. All individuals are accurately classified into continental groups using a model-based clustering algorithm, but between closely related populations, genetic and self-classifications conflict for some individuals. The 250K data permitted high-level resolution of genetic variation among Indian caste and tribal populations and between highland and lowland Daghestani populations. In particular, upper-caste individuals from Tamil Nadu and Andhra Pradesh form one defined group, lower-caste individuals from these two states form another, and the tribal Irula samples form a third. Our results emphasize the correlation of genetic and geographic distances and highlight other elements, including social factors that have contributed to population structure.Microarray technology has generated unprecedented quantities of data on human genetic variation. These data are useful for fine-scaled inferences of human evolutionary history (Jakobsson et al. 2008; Li et al. 2008; Novembre et al. 2008; Tian et al. 2008) and, under some circumstances, the estimation of individual ancestry (Seldin et al. 2006; Bauchet et al. 2007; Price et al. 2008; Tian et al. 2008). In this context, the new data have contributed to a better and more nuanced understanding of the relationship between genetics and “race” (Race, Ethnicity, and Genetics Working Group 2005; Witherspoon et al. 2007). In addition, a more thorough knowledge of between-population genetic variation also has been important in improving the design and interpretation of case-control studies of common diseases (Wellcome Trust Case Control Consortium 2007; Nelson et al. 2008; Price et al. 2008).For a variety of reasons, most studies have focused primarily on European populations (Seldin et al. 2006; Bauchet et al. 2007; Novembre et al. 2008; Price et al. 2008; Tian et al. 2008), and worldwide coverage of human populations remains incomplete. For example, the Human Genome Diversity Project (HGDP) database, one of the most widely used resources, lacks coverage in the Indian subcontinent. Other major regions, such as Eastern Europe and northern Africa, are also underrepresented in databases of human genetic variation.Among these underrepresented populations, those of the Indian subcontinent, which contains one-sixth of the world''s inhabitants, are of particular interest. The origins of and relationships among Indian populations are the subjects of continuing debate (Bamshad et al. 1998, 2001; Basu et al. 2003; Vishwanathan et al. 2004; Watkins et al. 2005; Rosenberg et al. 2006; Chaubey et al. 2007), but most previous genetic studies of these populations have been based on modest data sets. Indian populations are also used increasingly in linkage and case-control studies of genetic disease (Alcais et al. 2007; Chambers et al. 2008; Holliday et al. 2008). A better understanding of the genetic structure in India will facilitate these studies.Here, along with another 21 populations from around the world, we analyzed six Indian populations, including five caste populations and one tribal population, from two southern Indian states (Andhra Pradesh and Tamil Nadu). The inclusion of caste populations from different states and with different languages allowed us to assess the effects of social status, geography, and language on genetic structure in Indian populations. We have also included Daghestanis from the Caucasus region and Ibans from Sarawak, Malaysia to improve coverage in other underrepresented regions. Our analysis offers new insights on the genetic affinities and evolution of populations residing between commonly studied populations in sub-Saharan Africa, Europe, and East Asia. |