Phonemic Diversity and out-of-Africa Again: The Myth is Gaining a Momentum

PLoS ONE 7(4): e35289. doi:10.1371/journal.pone.0035289

Dating the Origin of Language Using Phonemic Diversity

Perreault, Charles, and Sarah Mathew.


Language is a key adaptation of our species, yet we do not know when it evolved. Here, we use data on language phonemic diversity to estimate a minimum date for the origin of language. We take advantage of the fact that phonemic diversity evolves slowly and use it as a clock to calculate how long the oldest African languages would have to have been around in order to accumulate the number of phonemes they possess today. We use a natural experiment, the colonization of Southeast Asia and Andaman Islands, to estimate the rate at which phonemic diversity increases through time. Using this rate, we estimate that present-day languages date back to the Middle Stone Age in Africa. Our analysis is consistent with the archaeological evidence suggesting that complex human behavior evolved during the Middle Stone Age in Africa, and does not support the view that language is a recent adaptation that has sparked the dispersal of humans out of Africa. While some of our assumptions require testing and our results rely at present on a single case-study, our analysis constitutes the first estimate of when language evolved that is directly based on linguistic data.

Link (Free Full Text PDF)

Another paper attempts to show that phonemic diversity decreases with distance from Africa. I have already discussed some of the issues with using phonemic data to argue for a dispersion out of Africa and won’t repeat it again. Dienekes has references to earlier criticisms of this argument plus posts his own doubts. The recurrence of the phonemic diversity argument in academic publications is, however, noteworthy and speaks to the unceasing attempts to mimic the out-of-Africa theory using linguistic data. Back in 1988, when mtDNA genetics was still in its infancy, Luca Cavalli-Sforza rallied his troops to find similarities in his genetic trees built on allele frequencies and long-range linguistic classifications. His summary graph is shown below.

“Nostratic,” “Indo-Pacific, “Amerind” and “Eurasiatic” phyla on the right of this graph are rejected by virtually all linguists. Those linguists who believe in megalocomparison are often affiliated with the Santa Fe Institute. An example would be Merritt Ruhlen or the late Sergei Starostin. Charles Perreault is also affiliated with the Santa Fe Institute, but he’s not even a linguist. He’s an archaeologist who suddenly publishes a paper on phonemic evolution in partnership with a social anthropologist armed with quantitative methods, Sarah Mathew.

The argument that phonemic diversity decays along the path out of Africa is advocated without any attempt to build alternative models and without an acknowledgment that, while phonemic diversity may be highest in Sub-Saharan Africa (although these counts include clicks, but for some reason exclude whistled sounds found in phoneme-poor non-African language communities such as Pirahã), the diversity as measured by the number of genealogical units, namely linguistic stocks, is highest in the New World and Australasia. This is where the majority of linguists, who dismiss the lumping labels such as “Nostratic,” “Indo-Pacific, “Amerind” and “Eurasiatic” on Cavalli-Sforza’s graph above, concur.

A very telling statement appears at the very beginning of the Perreault & Mathew article:

“African languages today have some of the largest phonemic inventories in the world, while the smallest inventories are found in South America and Oceania, some of the last regions of the globe to be colonized.”

What they have in mind is that Rotokas, a non-Austronesian language of New Guinea, and Pirahã, spoken in South-America, both have 11 phonemes, while Khwe or !Xun, a Khoisan language spoken in Southern Africa has 141 phonemes. This is precisely the moment when we can observe how untested assumptions work to generate seemingly valid scientific models. Perreault & Mathew assume that a) modern humans originated in Africa and b) that they colonized Africa and Oceania later than they did other regions. Having made this assumption, they begin seeing a process of phonemic diversity loss between Africa and the rest of the world. However, under the out-of-America II model it’s precisely the New World and Australasia that enjoy the greatest antiquity of modern human presence. It’s in these regions that one observes the highest level of linguistic diversity as measured by the number of linguistic families or stocks. Hence, the areas of linguistic antiquity are associated with the smallest phoneme inventories, not the largest. Perreault & Mathew’s argument is, therefore, easily falsified.

What is indisputable is that, regardless of whether we look at human linguistic evolution out of Africa or out of America, phoneme size reflects population size and the level of demographic isolation, as Perreault & Mathew correctly note. Under an out-of-Africa scenario, small population size and high degree of isolation are recent phenomena; under an out-of-America scenario, they are retentions from Middle Pleistocene times when hominins were organized in small, dispersed demes.

Perreault & Mathew choose to make sweeping generalizations to describe the state of genetic support for out-of-Africa. They write,

“Genetic data indicate that humans dispersed in Asia following a coastal route, from India to Australia, and that both Southeast Asia and Andaman Islands were colonized from a population that occupied the region spanning from southern India to the Malay Peninsula.”

The truth is that genetic data has never showed anything in support of a coastal route out of Africa. It was an early way to explain how humans may have gotten around Neandertals, who had replaced anatomically modern humans in Levant, because at that time it was assumed that Neandertals and humans did not admix. So, the coastal migration hypothesis was born out of paleontology and not genetics. Geneticists used it as a way to build a story around their mtDNA trees, but there was nothing in the distribution of mtDNA or Y-DNA haplotypes per se that would suggest a coastal route of dispersal from Africa. Let’s take the example of African mtDNA hg M1. It’s part of non-African macrohaplogroup M, which spawned a forest of basal lineages in India, but M1 is not basal to Indian M lineages. In fact, since Gonzalez (2007), it’s widely believed that M1 back-migrated to Africa. This very well may have happened from India along a coastal route but it’s not an out-of-Africa migration. Or, in a recent study of Andamanese mtDNA genetics, it’s suggested that humans first colonized the Andaman islands by using a land bridge that connected the Andaman Archipelago and Myanmar around the Last Glacial Maximum. This is based on a very detailed analysis of M sublineages in northeast India and Andaman islands. Perreaut & Mathew don’t reference this study but instead assume that “both Southeast Asia and the Andaman Islands were colonized during the Pleistocene dispersal of modern humans out of Africa, a process that started 70-60 kya.” Perreault & Mathew write that the coastal migration “was rapid” without understanding that the “rapid” descriptor that can be found in some genetics publications is an attempt to account for the fact that there is no trail of budding lineages connecting Africa and Southeast Asia via a southern route. Southeast Asia shows a sudden density of basal mtDNA lineages such as M, N and R, with no intermediary steps connecting M, N and R back to L3 along the putative coastal route.

Chasing other myths of human origins research, Perreault & Mathew announce:

“…Our result is consistent with the archaeological evidence suggesting that human behavior became increasingly complex during the Middle Stone Age (MSA) in Africa, sometime between 350-150 kya.”

Again, the reality is more complex than they are willing to allow. Symbolic behavior is notoriously hard to pin down in the archaeological record, and possible signs of symbolic behavior are associated not only with Mid-Pleistocene hominins in Africa but also with Mousterian Neandertals in Europe, as the recent study of the use of the skeletal parts of diurnal raptors in France around 100,000 years ago suggests. Beyond the threshold of 40,000 years, archaeology is a poor indicator of what constitutes modern human behavior and what kind of “symbolic behavior” defines our species vs. other species such as Neandertals or African archaics.

Perreaut & Mathew demonstrate superficial knowledge of both linguistics and genetics and they use untested assumptions as shortcuts into bold theorizing. This is all for the purpose of restating the well-known tenets of out-of-Africa thinking now with the help of cherry-picked and poorly understood linguistic data.