An Abuse of Interdisciplinarity: Modeling the Indo-European Homeland Again

Science 2012, Vol. 337 no. 6097 pp. 957-960 doi: 10.1126/science.1219669

Mapping the Origins and Expansion of the Indo-European Language Family

Remco Bouckaert, Philippe Lemey, Michael Dunn, Simon J. Greenhill, Alexander V. Alekseyenko, Alexei J. Drummond, Russell D. Gray, Marc A. Suchard, Quentin D. Atkinson

There are two competing hypotheses for the origin of the Indo-European language family. The conventional view places the homeland in the Pontic steppes about 6000 years ago. An alternative hypothesis claims that the languages spread from Anatolia with the expansion of farming 8000 to 9500 years ago. We used Bayesian phylogeographic approaches, together with basic vocabulary data from 103 ancient and contemporary Indo-European languages, to explicitly model the expansion of the family and test these hypotheses. We found decisive support for an Anatolian origin over a steppe origin. Both the inferred timing and root location of the Indo-European language trees fit with an agricultural expansion from Anatolia beginning 8000 to 9500 years ago. These results highlight the critical role that phylogeographic inference can play in resolving debates about human prehistory.


There are two main ways in which the principle of interdisciplinarity can be abused in the study of prehistory. The first way is analogous to the Stockholm Syndrome. It applies to cases when scholars with a background in the social sciences and humanities use models and interpretations advanced from within disciplines perceived as “hard sciences” (usually, population genetics and archaeology) in order to make sense of their own “cultural” data and build models isomorphic with population genetic or archaeological models.  I have already criticized at length Yuri Berezkin’s, Victor Grauer’s and Alan Barnard’s attempts to force feed complex comparative mythological, ethnomusicological and kinship material, respectively, into an out-of-Africa straitjacket. These scholars have no hands-on or theoretical expertise in population genetics or archaeology and they assume an admiring bystander attitude toward these disciplines to the effect that population genetic and archeological data and results are somehow inherently less labile than linguistics, folkloristics, kinship studies and ethnomusicology. At the same time, the patterns derived from the study of the “hard data” are presented as very similar to the patterns observed in the “fuzzy data.” At a closer inspection, however, the proposed isomorphisms between the patterns of “hard data” and “fuzzy data” turn out to be superficial and ambiguous and the aura of invincibility around genetics and archaeology dissipates once it becomes clear that genetics admits multiple explanations for the same phenomenon, multiple mutation and calibration rates and multiple evolutionary models, and archaeology and paleobiology are inherently fragmentary and generic to be used as a robust filter to discard hypotheses generated on the basis of the study of modern human populations.

The second variety of interdisciplinary abuse encompasses recurring attempts by biologists, evolutionary psychologists, mathematicians and computer scientists to hijack linguistic and cultural data in order to “test” competing hypotheses within these fields using their “superior” methodologies or to bolster models of prehistory derived from genetics. Geneticists used to actively map long-range linguistic classifications onto their trees of genetic frequencies to produce seemingly holistic models of human dispersals (see, e.g., Cavalli-Sforza, Luca L., A. Piazza, P. Menozzi, and J. Mountain. 1988. “Reconstruction of Human Evolution: Bringing Together Genetic, Archaeological and Linguistic Data,” Proceedings of the National Academy of Sciences 85: 6002-6006). These linguistic classifications had one disadvantage – they were dismissed by mainstream linguistics as false – but geneticists apparently could live with this. At one point, this practice became so commonplace that linguists had to issue a cease-and-desist notice published in a flagship genetics journal (see Bolnick, D. A., B. A. S. Shook, L. Campbell, and I. Goddard. 2004. “Problematic Use of Greenberg’s Linguistic Classification of the Americas in Studies of Native American Genetic Variation,” American Journal of Human Genetics 75, 519-522). Another example is the argument for antiquity of click sounds made by geneticists on the basis of gene trees showing that Khoisan-speakers, who have a well-developed inventory of click phonemes, are also “basal” to other human populations (see Knight, Alec et al. 2003. “African Y chromosome and mtDNA Divergence Provides Insight into the History of Click Languages,” Current Biology 13, 464-473). Linguists responded that there are no linguistic reasons to believe that click sounds are primordial in humans (Güldemann, Tom, and Mark Stoneking. 2008. A Historical Appraisal of Clicks: A Linguistic and Genetic Population Perspective,” Annual Review of Anthropology 37, 93-109). Or, the recent case of the application of the serial founder effect model to phonemic inventory sizes, which supposedly decrease, in a manner similar to allelic diversity, with increasing distance from Africa. Again, linguists re-analyzed the data to expose multiple flaws of this model.

Many of these pseudo-linguistic studies are published in such high-science venues as PNAS, Science and Nature. But make no mistake: this is not because they are more scientific than other studies. It is because they have no chance of getting published in journals created by and for professional linguists. It is ironic how mainstream science chooses to associate itself with studies that have little grounding in the disciplines they purport to advance (see a wider discussion here). It hopes that the scientific cache of the publication venue will somehow make these studies more scientific in reality, but the true outcome is that these studies compromise science.

The problem of Indo-European subgrouping and Indo-European (IE) homeland is a favorite target for the application of statistical methods derived from the biological sciences. Bouckaert et al. (2012) is the new study of this kind that tests Anatolian vs. Pontic Steppe models of the Indo-European homeland and reiterates the earlier conclusion that the Bayesian phylogenetics of the Indo-European family places Anatolian as the most divergent branch in the family, which suggests that Indo-Europeans originated in Anatolia and not in the Pontic Steppe. The divergence dates obtained by this method (10,000-8,000 YBP) are on the order of magnitude higher than the Bronze Age dates associated with the Pontic steppe model and correlate the dispersal of Indo-European speakers with the spread of agriculture from the Middle East and not with the spread of pastoralism from the European steppe.

The big problem with this approach is the methodological incompatibility between Anatolian and Pontic Steppe models. The Pontic Steppe model is a model built on the basis of archaeology, while the Anatolian model is based on linguistic evidence. There are no ancient or extinct IE languages from the Pontic Steppe. No linguist has ever claimed that Russian and Ukrainian, the two IE languages currently spoken around the Black Sea, trace back to the original IE (“Kurganic”) languages spoken by Yamnaya and other archaeological cultures. The Anatolian branch, on the other hand, is considered by these researchers as having stayed close to the Indo-European homeland in South Anatolia. At the same time, the dates associated with Yamnaya (not to mention its less known antecedents, Samara and Khvalynsk) are on the order of magnitude older (3600 BC at the minimum) than the earliest attestations of the Hittite language (2000 BC at the maximum). There is no way to know for sure what Kurganic languages sounded like, but considering their greater antiquity than Hittite there is a good chance they were more divergent than Hittite. In any case, the existence of Hittite attestations and the failure of Kurganic languages to survive into the present are accidents of history, and in the absence of Kurganic attestations the greater divergence of Hittite remains an artifact of historical preservation and not an objective evidence in favor of the Anatolian model vs. the Pontic Steppe model.

The test would be more objective if Bouckaert et al. (2012) found a way to test the Anatolian model against Johanna Nichols’s Bactrian model, as both models are derived from linguistic data and linguistic data alone. But Bouckaert et al. (2012) do not even mention the Bactrian model betraying their lack of in-depth knowledge of the field.

Another problem with the studies such as Bouckaert et al. (2012) is that Indo-European linguists using other quantitative methods consider the statistical phylogenetic models derived from the biological sciences as naive and the perception of language change and lexical replacement by the non-linguists behind them as misguided. Indo-Europeanists using other methods arrive at different resolutions of Indo-European subgrouping in which Anatolian is not most divergent.

Finally, science blogger Dienekes spread an internet meme regarding the putative presence of a “West Asian” autosomal genetic component in Indo-European speakers but not in non-Indo-European speakers of Europe. Razib continues to mindlessly reference this “finding” as consistent with the Anatolian homeland, despite the fact that this autosomal component is found at high frequencies in the northern Caucasus (the pro-Anatolian advocate Dienekes mislabeled it as “West Asian” rather than “Caucasus” to create an easy handle to help the myth propagate) and can be legitimately interpreted as the product of admixture between proto-Indo-Europeans and proto-North Caucasians (supported by a large number of loans from North Caucasian into Indo-European and by a few typological similarities between the two families suggestive of an ancient Sprachbund), which is consistent with the location of the Indo-European homeland in the Pontic Steppe.