Y-DNA hg C3* in South America and Putative Ancient Transpacific Contacts

PLoS Genet 9(4): e1003460. doi:10.1371/journal.pgen.1003460

Continent-Wide Decoupling of Y-Chromosomal Genetic Variation from Language and Geography in Native South Americans

Lutz Roewer, Michael Nothnagel, Leonor Gusmão, Veronica Gomes, Miguel González, Daniel Corach, Andrea Sala, Evguenia Alechine, Teresinha Palha, Ney Santos, Andrea Ribeiro-dos-Santos, Maria Geppert, Sascha Willuweit, Marion Nagy, Sarah Zweynert, Miriam Baeta, Carolina Núñez, Begoña Martínez-Jarreta, Fabricio González-Andrade, Elizeu Fagundes de Carvalho, Dayse Aparecida da Silva, Juan José Builes, Daniel Turbón, Ana Maria Lopez Parra, Eduardo Arroyo-Pardo, Ulises Toscanini, Lisbeth Borjas, Claudia Barletta, Elizabeth Ewart, Sidney Santos, Michael Krawczak.

Numerous studies of human populations in Europe and Asia have revealed a concordance between their extant genetic structure and the prevailing regional pattern of geography and language. For native South Americans, however, such evidence has been lacking so far. Therefore, we examined the relationship between Y-chromosomal genotype on the one hand, and male geographic origin and linguistic affiliation on the other, in the largest study of South American natives to date in terms of sampled individuals and populations. A total of 1,011 individuals, representing 50 tribal populations from 81 settlements, were genotyped for up to 17 short tandem repeat (STR) markers and 16 single nucleotide polymorphisms (Y-SNPs), the latter resolving phylogenetic lineages Q and C. Virtually no structure became apparent for the extant Y-chromosomal genetic variation of South American males that could sensibly be related to their inter-tribal geographic and linguistic relationships. This continent-wide decoupling is consistent with a rapid peopling of the continent followed by long periods of isolation in small groups. Furthermore, for the first time, we identified a distinct geographical cluster of Y-SNP lineages C-M217 (C3*) in South America. Such haplotypes are virtually absent from North and Central America, but occur at high frequency in Asia. Together with the locally confined Y-STR autocorrelation observed in our study as a whole, the available data therefore suggest a late introduction of C3* into South America no more than 6,000 years ago, perhaps via coastal or trans-Pacific routes. Extensive simulations revealed that the observed lack of haplogroup C3* among extant North and Central American natives is only compatible with low levels of migration between the ancestor populations of C3* carriers and non-carriers. In summary, our data highlight the fact that a pronounced correlation between genetic and geographic/cultural structure can only be expected under very specific conditions, most of which are likely not to have been met by the ancestors of native South Americans.


Based on the largest sample of South American males to date, the paper makes several intriguing observations, findings and claims some of which are controversial. First of all, they discovered that all South American haplotypes fall into two main haplogroups – Q1a3 (and its descendant Q1a3a) and C3* (C-M217). While the former is well-described and pan-American, the latter is rare in South America and virtually absent in North America. It was first reported among the Colombian Wayuu by Zegura et al. 2004  (“High-Resolution SNPs and Microsatellite Haplotypes Point to a Single, Recent Entry of Native American Y chromosomes into the Americas,” Mol Biol Evol 21: 164-175). Subsequently, Geppert et al. 2011. (“Hierarchical Y-SNP assay to study the hidden diversity and phylogenetic relationship of native populations in South America,” Forensic Sci Int Genetics 5 (2): 100-104) confirmed the presence of hg C3* in Ecuador. Roewer et al. (2013) further specify that hg C3* is present in Quechuan-speaking Kichwa (26%) and the Waorani isolate (7.5%). This is a significant finding as it finally resolves the ambiguity around the relationship of South American C3s to North American C3c. While North American C3s are virtually all of C3b variety, South American C3s constitute a different lineage. Roewer et al. (2013) also mention C3* lineages among the North American Tlingit from Schurr et al. 2012 (“Clan, Language, and Migration History Has Shaped Genetic Diversity in Haida and Tlingit Populations from Southeast Alaska,” Am J Phys Anthrop 148: 422–435) (although they mistakenly mention one lineage instead of two, in Schurr et al.’s Hoona and Yakutat samples). The overall distribution of hg C3* is shown below in yellow.

Anthropogenesis-Roewer-Map copy

Southeast Asian C3* lineages were reported from Vietnam, Borneo, Malaysia and Bali by Scheinfeldt et al. 2006 (“Unexpected NRY Chromosome Variation in Northern Island Melanesia,” Mol Biol Evol 23 (8): 1628-1641) and likely represent Neolithic gene flow from China. A full median-joining network demonstrates several ethnogeographic clusters within C3* haplotypes.

Anthropogenesis-Roewer-C3 copy

Korean (blue) and Mongolian (yellow) haplotypes form two largest clusters. Colombian, Ecuadorian and Tlingit (ALA, pink) haplotypes are quite divergent from each other and do not fall into one cluster.

It’s interesting that there seems to be a special proximity between Ecuadorian and some Koryak C3* lineages:

“The most frequent Ecuadorian C3* chromosome H7 (occurring eight times in the Kichwa) shared an identical 8-locus haplotype with two Koryak samples from Kamchatka.”

As a parallel to C3* chromosome H7 fashion, “some carriers of Q1a3a have also been found in Siberia, probably reflecting reverse gene flow from Alaska into Asia.” Roewer et al. refer to Q1a3a in the Evens in Sea of Okhotsk coast, the western neighbors of the Koryaks, as well as the Chukchis and the Siberian Eskimos (see Malyarchuk et al. 2011. “Ancient Links between Siberians and Native Americans Revealed by Subtyping the Y chromosome Haplogroup Q1a,” J Hum Gen 56, 583-588).

The Koryak link aside, Roewer et al. (2013) do not address the possibility that some of the C3* lineages in Asia and America are further removed from each other than, say, C3b or C3c lineages. The reason they are lumped together as C3* is because their exact SNP differences from each other and from other C3 lineages are simply unknown. There is no reason to interpret the haplotype network and the geographic distribution of C3* as a signal of migration from Asia to the New World. Considering their low frequencies, mutual divergence and patchy distribution, there seems to be little doubt that Amerindian C3* have been subject to greater genetic drift and extinction than their Asian counterparts.

Roewer et al. (2013) make a strong claim regarding the lack of correlation between Y-STR and linguistic variation in South America. Contrary to a slew of case studies from Europe, Africa and Asia documenting strong parallelism between Y-DNA variation and language families, this is claimed to not be the case for South America:

“Virtually no structure became apparent for the extant Y-chromosomal genetic variation of South American males that could sensibly be related to their inter-tribal geographic and linguistic relationships. This continent-wide decoupling is consistent with a rapid peopling of the continent followed by long periods of isolation in small groups.”

But this conclusion is misleading considering what the main body of the paper presents. Let’s begin with the following quote:

“A highly significant association was observed between haplogroup and both language class (Cramer’s V = 0.20, p<10−8, Table 1) and language group (V = 0.41, p<10−8, Table S3).”

And then:

“Language class was not found in an AMOVA to explain much of the Y-STR genetic variation (<0.5% for both marker sets). In contrast, differences between the more narrowly defined language groups explained 12% and 8% of the variation for the large and the small marker set, respectively.”

Earlier in the paper Roewer et al (2013) acknowledged the inadequacy of Joseph Greenberg’s lumping of the majority of American Indian languages into the Amerind macrophylum (which is amazing progress considering how sticky this ill-famed classification has proven to be in population genetic circles). But what they refer to by “language classes” in the quote above are Equatorial-Tucanoan, Ge-Pano-Carib, Andean and Chibchan-Paezan groups, which are little more than Greenberg’s Amerind subgroups carried over to Greenberg’s collaborator, Merritt Ruhlen’s Guide to the World Languages (1991) that Roewer et al. 2013) used. Importantly, it’s the correlation between Y-STR genetic variation and those Amerind subgroups that’s statistically poor. Once these artificial groupings are broken down into the widely accepted constituent language groups (Araucanian, Arawakan, Aymara, Carib, Chibcha, Mosetenan, Chon, Embera, Ge, Paezan [Guambiano], Jivaroan, Mataco-Guaicuruan, Mbya-Guarani, Panoan, Quechua, Tupi, Wao-Tiriro, Yanoman, Yuracare, Zaparo), the correlation increases twenty-fold.

The authors acknowledge this correlation boost by reporting “limited correlation between language group (but not class) and genetic variation.” So, it’s not that there’s “virtually no structure in the genetic variation” that can be attributed to linguistic relationships. The correlation is weak when spurious “language classes” are used and it gets stronger with the increasing granularity of linguistic classification and with larger genetic marker sets. Furthermore, genetics can capture the finer grain of differentiation in South America than linguistics, which is consistent with strong geographic isolation and local endogamy leading to the phenomenal microdifferentiation of South American populations as documented by many students of Amerindian classical markers (see more here). Figs. S4 and S7 (see below) from Roewer et al. (2013) illustrate the situation well – the South American Y-STR landscape is dominated by low-frequency haplotypes (hg Q1a3a and C3* follow the same pattern), and the four “language classes” correspond to haplotypes from radically divergent parts of the median-joining networks.

Anthropogenesis-Rouwer-S4 copyAnthropogenesis-Rouwer-S7 copy







Notably, Roewer et al. (2013) leave without a mention the intriguing similarity between the pan-American distribution of hg Q-M242 and Greenberg’s Amerind macrophylum. Whatever their reasons for this omission are, the similarity does not work in Greenberg’s favor because this Y-DNA lineage is found not just among Amerind- but also among Na-Dene- and Eskimo-Aleut-speakers. It does seem to be the case that Greenberg erred not only on the side of lumping but also on the side of splitting. While his internal subgrouping of Amerind does not find support in this Y-DNA study of South American populations, correlations between Y-STR variation and linguistic classifications may exist not only on the level of first-order language families and isolates but also on the megataxonomic level that lumps all New World languages and all the Old World languages whose speakers carry M-242 together.

The most controversial proposal made by Roewer et al. (2013) involves the hypothesis of a recent (~ 6,000 YBP) direct contact between Japan and northwest South America that resulted in the introduction of hg C3* in South America. They bolster their claim by a passing reference to the work of anthropologist Betty J. Meggers:

“[T]here appears to be at least some archaeological evidence for a pre-Columbian contact between East Asia and South America. In particular, the similarity of ceramic artifacts found in both regions led to the hypothesis of a trans-Pacific connection between the middle Jōmon culture of Kyushu (Japan) and the littoral Valdivia culture in Ecuador at 4400–3300 BC. In view of the close proximity of the spotty C3* cluster to the Valdivia site, which was considered at the time to represent the earliest pottery in the New World, it may well be that C3* was introduced into the northwest of South America from East Asia by sea, either along the American west coast or across the Pacific (with some help by major currents).”

Meggers developed a thorough componential analysis of pottery designs, and Valdivia and Jomon pottery are indeed similar (see below, from Meggers, Betty J. “Archaeological Evidence for Transpacific Voyages from Asia Since 6000 BP.),” Estudios Atacameños 15: 107-124).

Anthropogenesis-Valdivia copyAnthropogenesis-JomonPottery copy










But does it represent evidence of a direct contact in the Holocene? Not in the very least. While it’s true that “at the Pacific coast, the average C3* frequency is higher in Korea (10%) than in Japan (3%), with the notable exception of 15% for the Ainu from Hokkaido,” we do not know if Ainu C3* is actually the same lineage as Ecuadorian C3* (see above). But most importantly, Ainu show high frequencies of Y-DNA hg D (YAP+) (see below, from Matsukusa et al. 2010. “A Genetic Analysis of the Sakishima Islanders Reveals No Relationship With Taiwan Aborigines but Shared Ancestry With Ainu and Main-Island Japanese,Am J Phys Anthrop 142: 211-223), which is a very rare Asian marker attested, outside of Japan, only among Andamanese, Tibetans and mostly Daic-speakers in Southeast Asia. Hg D is considered to be a Jomon marker but it has never been reported from the Americas, and Ecuador is no exception.

Anthropogenesis-AinuhgD copy

In the light of the growing body evidence supporting a strong population and cultural impact of Okhotsk culture on Ainu 900–1600 years ago (see Lee, Sean, and Toshikazu Hasegawa. 2013. “Evolution of the Ainu Language in Space and Time,” PLoS ONE 8(4): e62243), it is worth considering the possibility that hg C3* in Ainu is the product of this migration and hence its emergence in Japan postdates Valdivia. This is supported by high (38%) frequencies of hg C3* (C-M217) among Nivkhs (see Atsushi, et al. “Genetic Origins of the Ainu Inferred from Combined DNA Analyses of Maternal and Paternal Lineages,” J Hum Genet 2004, 49, 187-193), the historical heirs of Okhotsk culture in the Amur Basin.

In the absence of hg D in Ecuador, it’s hard to fathom a direct connection between Holocene Japan and the northwest coast of South America. The material similarities between Valdivia and Jomon pottery remain examples of convergence (on par with such remarkable trans-oceanic parallelisms as Solutrean and Clovis projectile points) unless a genetic connection between the two regions is established more securely. When Meggers originally discovered these similarities, Valdivia pottery was thought of as the earliest instance of pottery in the New World and there was an ingrained belief in the superiority of Old World craftsmanship over New World craftsmanship:

“The technical and artistic level of Valdivia pottery is too high…for it to represent a local invention of pottery making” (Estrada, Emilio, Betty J. Meggers and Clifford Evans. 1962. “Possible Transpacific Contact on the Coast of Ecuador,” Science 135 (3501): 371-372).

Gisele Horvat made an intriguing suggestion that Y-DNA C3* is the male counterpart of mtDNA hg D4h3a. Not only hg D4h3a is found in the ancient remains from the On Your Knees Cave in close proximity to the Tlingit Indians, but it’s also achieves highest frequencies (23%) in Ecuador among Cayapo. Unlike Y-DNA hg C3*, mtDNA hg D4h3 is pan-American, so the origin of D4h3 from a direct migration from Japan to Ecuador is out of the question. It’s possible that better sampling will expand the geographic footprint of hg C3* in the New World. It’s worth pointing out, however, that hg D4h3 is not known in Japan (modern Ainu show hg D4h1, while hg D4h2 has been obtained from ancient Jomon remains) and the only example of hg D4h3 outside of the New World is in Han from Quingdao, Shandong, China (Kemp et al. “Genetic Analysis of Early Holocene Skeletal Remains From Alaska and its Implications for the Settlement of the Americas,” Am J Phys Anthrop 132: 605-621). While the similarities are suggestive, they likely derive not from direct migrations from Japan to South America (either along the coast or across the ocean) in the Holocene but from more ancient kinship between populations around the Pacific Rim disrupted by the strong forces of genetic drift in the New World. In a word, South American C3* and North American C3b must have been part of the original paternal gene pool in the New World.