The End of Out-of-Africa: A Copernican Reassessment of the Patterns of Genetic Variation in the Old World

Over at Anthrogenica, I’ve been having some heated (as always) but this time also productive discussions regarding the interpretation of currently available genetic evidence. In the following I will sketch out a hypothesis that increasingly makes sense to me.

1. The science of human origins is undergoing a seismic shift away from Out-of-Africa with a Serial Bottleneck thinking to Archaic Admixture and Back Migration thinking. Whole genome sequencing and ancient DNA brought about strong evidence for  “archaic introgression” in Africa (statistical inference in the absence of ancient DNA), Eurasia (Neandertals) (direct) and Melanesia (Denisovans) (direct).

2. Uniparental systems (mtDNA and Y-DNA) have so far failed to detect “archaic admixture.” But this situation may have changed with a recent report by Boris Malyarchuk of the finding of a two-nucleotide mtDNA haplotype shared between late European Neandertals and modern humans belonging to the L2’3’4’5’6 clade. If interpreted from a hybridization angle (Malyarchuk also considers convergence), this finding means that all of non-Africans and most Africans descended not from African L0 and L1 haplotypes but from a sequence that introgressed from European Neandertals. This is consistent with whole-genome finding that all non-Africans and some East Africans (including Hadza) carry Neandertal allelles.

4. Neandertals being a Eurasian species, the majority of African mitochondria are therefore of Eurasian origin. According to current phylogenies, all of Eurasian mtDNA variation falls under two macrohaplogroups M and N. Although Africans who descended from European Neandertals have phylogenetically more ancient lineages, namely L5, L2, L6, L4/L3 (according to the most recent PhyloTree build), this only means that these lineages survived in Africa and went extinct in Eurasia. But their antecedent in attested among European Neandertals. In Eurasia, out of the whole Neandertal-derived clade, only M and N survived. This interpretation is consistent with the finding of basal L6 sublineages in West Asia (Yemen, Qatar).

5. With this idea in mind, it’s now easy to see why no African L lineages have ever found along the putative out-of-Africa routes of migration (the coastal route via India or the inland route via Levant and Central Asia) and at the hypothetical earliest terminal points of the human journey such as the Sahul. They are not found there because African L2’3’4’5’6 lineages entered Africa from Western Eurasia and subsequently never left Africa. It’s also easy to see why Africa has only a subset of Eurasian lineages such as M1, U6 and X1/2, although, under the out-of-Africa reading of the phylogenies, Eurasian lineages are a subset of a subset of African lineages (namely L3). Finally, it’s easy to see why the basal diversity of mhg M and N is concentrated not in close geographic proximity to Africa – its purported ultimate source – but in East Asia and eastern South Asia. This is so simply because they never originated in Africa or on the road from Africa.

6. Considering that the L2’3’4’5’6 clade may have introgressed into modern human mtDNA gene pool from European Neandertals, there is nothing that prevents us from hypothesizing that L0 and L1 lineages introgressed into modern Africans from “archaic Africans.” This interpretation provides a perfect explanation of the puzzling – from the point of view of classical out-of-Africa model – absence of the oldest human mtDNA lineages outside of Africa (again anywhere along the putative routes of expansion of modern humans from Africa). It also answers another thorny question, namely why the oldest human mtDNA lineages have remained narrowly localized for 100,000 years and are concentrated mainly in such Sub-Saharan populations as Khoisans and Pygmies. Instead of imagining Khoisans and Pygmies as two primordial offshoots from the Great Human Mitochondrial Tree, these populations can now be seen as relatively recent populations who absorbed “archaic African” sequences as modern Eurasians were colonizing Africa.

7. Instead of thinking about human population history as the mechanical branching of genetic lineages and the exclusive survival of derived lineages in the colonized territories, it looks like the human mtDNA phylogeny is a patchwork of archaic introgressions whose actual historical order of occurrence is inversely related to their apparent phylogenetic order. Modern humans did not originate in Africa and didn’t expand out of Africa to colonize the globe. Instead, as they expanded across Eurasia and then Africa, they first absorbed Neandertal mtDNAs and then some Africans absorbed archaic African mtDNAs. The modern human mtDNA phylogeny was literally cobbled together.

8. In the absence of “archaic Y-DNA,” I can tentatively propose the following to match the mtDNA hypothesis fleshed out above. It’s a well-known fact that Y-DNA tree topology sometimes matches mtDNA tree topology and sometimes diverges from it. One of the important parallels between the mtDNA and Y-DNA trees is the similar position of two “basal” clades – L0 and L1 in mtDNA and A and B on Y-DNA. Both sets are highly frequent in Khoisans and Pygmies and not found outside of Africa (along the putative migration routes out of Africa and at the terminal points of the journey). One of the key differences between mtDNA and Y-DNA trees is the awkward position of the main African Y-DNA clade, namely E. It’s an African-only subset of the Eurasian clade CT and a “brother” clade to Eurasian-specific hg D. But in the light of my interpretation of the mtDNA phylogeny above Y-DNA hg E has all the accoutrements of mtDNA hg L2’3’4’5’6 in Africa: it’s most frequent and most widely spread African lineage. The plausible Y-DNA correlate (provisionally, in the absence of ancient DNA) to the haplotype introgressed from Neandertals is hg DE. This would explain its weird distribution in Eurasia: Europe, Central Asia, Tibetans, Andamanese, Ainu. It encompasses the known range of Neandertals with some subsequent, post-LGM, outreaches into Andaman islands, Japan and Tibet. It appears that DE was overrun and marginalized by hgs F and C (CF clade) and those lineages fall outside of the pattern suggested by the hypothesis that mtDNA L2’3’4’5’6 introgressed from Neandertals. This means hgs F and C didn’t introgress from Neandertals but likely represents the genuine Y-DNA signature of Homo sapiens sapiens. And this signature has a clear non-African provenance being concentrated in Eurasia, America and the Pacific. CF clade has never been reported in African populations.

9. The likely correlate for Y-DNA CF clade among mtDNA haplogroups is macrohaplogroup R. Its phylogeographic position (technically most derived of all macrohaplogroups but most widely spread globally) is awkward. How can supposedly oldest clades L0 and L1 be restricted to Sub-Saharan Africa and highly frequent in only two populations (Khoisans and Pygmies), while supposedly youngest clade R is found and highly frequent in the New World, including South America, the Sahul, South Asia and Western Eurasia? Currently available most ancient mtDNAs all belong to mhg R (Kostenki is U2, Tianyuan is B, Mal’ta is U), although one would expect older and more diversified haplogroups such as M and N to be randomly picked by our imperfect ancient DNA sampling. The Mal’ta boy shows the co-presence of mtDNA U and Y-DNA R confirming the ancient parallelism of mtDNA R and Y-DNA CF.

10. The above suggests that mtDNA hg R and Y-DNA hg CF represent a haploid signature of pure, unmixed Homo sapiens sapiens. (The Tianyuan sample confirms that the carrier of mtDNA hg B had no traces of Denisovan admixture.) This is consistent with the phylogenetically most derived position of mhg R and the presence of a cascade of derived haplogroups in Y-DNA hg CF, some of which have a remarkably wide geographic spread and stable frequencies. For example, hg P (xQ,R) covers all of the Americas plus all of Western Eurasia. But, again, considering that human haploid phylogenies are a patchwork of archaic introgressions whose actual historical order of occurrence is inversely related to their apparent phylogenetic order, mtDNA mhg R did not descend from mhg N but instead populations carrying mhg R lineages absorbed archaic hominin sequences that led to hgs M and N as these R-people colonized Eurasia. Significantly, the Y-DNA phylogeny doesn’t have hg CF as the most derived clade but rather a “brother’ clade to DE with a cascade of derived haplogroups inside.

11. The interpretative framework constructed above for mtDNA and Y-DNA phylogenies can be further applied to other genetic systems such as blood groups. Blood group O is the youngest among the main blood groups found among modern humans. It may have emerged multiple times in hominin and primate history. But it’s also the most frequent and widely spread among modern humans. Notably, it’s highest concentration is in the New World (it’s fixed in most South American populations) and Australia. These are precisely the areas where mtDNA R and Y-DNA CF are strongly represented (Y-DNA hg Q is fixed in many Amerindians). These are also the areas devoid of archaic hominins. The traditional explanation for the blood group pattern in the New World is, of course, a bottleneck that presumably eliminated all the B and A alleles from a proto-Amerindian population. This explanation, however, has no ancient DNA support and overlooks the fact that Amerindians are not an oddity among modern human populations. Blood group O is the most frequent type on all continents. Amerindians are just more typical humans than other humans.

12. There’s no information about blood groups among Denisovans. One important thing is that the Tianyuan specimen is mtDNA hg B and it showed no Denisovan introgression. If, as I suspect, mtDNA mhg R is a signature of genuine, unadmixed Homo sapiens sapiens and blood group O is its correlate on the blood group side, then I’d be very surprised if it turns out that Denisovans are blood group O, too. On the other hand, I wouldn’t be surprised if Denisovans turn out to be blood group B considering that blood group B peaks in China, North India, Pakistan, parts of Siberia (roughly where Denisovans were found, see John Hawks’s speculations about Denisovans in India) but also, notably, is found in Central Papua New Guinea (23%) and in Queensland Aborigines (pace Kristiina). This is again consistent with the distribution of Denisovan admixture. In this case, we should expect to continue to see almost-zero percentages of Denisovan admixture in the New World and parts of Africa, including San Bushmen, because blood group B is very rare there, too.

What this paradigm shift does is that it explains many “inconvenient facts” generated by the out-of-Africa model and, in a Copernican fashion, moves us away from Afrocentrism in the interpretation of modern human prehistory. Importantly, from this perspective it becomes very easy to understand why Africa is relatively homogenous linguistically, while 2/3 of world linguistic diversity as measured by the number of unique genealogical units called stocks or families is concentrated in the New World and the Sahul. This is because Africa was peopled by behaviorally modern humans relatively late, the early migrants to Africa carried with it just a subset of extra-African languages, and the continent didn’t have enough time to accrue more diversity. The new paradigm also helps avoid such a painful and improbable interpretation of the current mtDNA phylogeny as the massive language shift to farmers’ languages by a dozen or so of Pygmy forager populations in Africa. In reality, Pygmies didn’t adopted Niger-Congo (and Nilo-Saharan languages). The early foragers speaking Niger-Congo- and Nilo-Saharan languages absorbed an “archaic substrate.” Later, most of Niger-Congo hunter-gatherer populations shifted to agriculture, while some of them maintained the original lifestyle and developed short stature. That seems to be the only way to explain the presence of mtDNA L1 sublineages in both farmers and foragers without suggesting that foragers and farmers split in the Mid-Pleistocene and later foragers adopted farmers’ languages without leaving their natural habitat and without changing their economy.