The Indo-Uralic Desire and the Comparative Method

Journal of Indo-European Studies 43, nos. 3-4 (2015): 348-56.

“Response to Kassian et al., ‘Proto-Indo-European-Uralic comparison from the probabilistic point of view’.”

Ringe, Don.


Rarely does a response to a paper become a must-read, when the paper itself deserves only a passing look. But in the case of Don Ringe’s assessment of the latest effort to prove that Indo-European and Uralic languages descended from a common ancestor, this is indeed the case. Alexei Kassian (the ill-famed freak-hunter), Mikhail Zhivlov (unbeknownst to me) and George Starostin (the son of Sergei Starostin) wrote the parody of a scientific paper in which they marshaled as many as 7 IE-Uralic pairs of words (within the “accepted” 50-item wordlist) that should constitute etymological cognates “within the framework of the Nostratic theory” (p. 319) and that were picked as positive matches by the authors’ “formal algorithm” (a permutation test of matches within 13 consonant classes, or “D-classes” named after one of the founders of the Nostratic movement, Aaron Dolgopolsky). Kassian et al. convinced each other that probability that those 7 words are accidentally similar between the two proto-languages is so low, that they just have no other choice but to pronounce Indo-European and Uralic “sisters.” Despite the fact that all the three authors are employed by Russian academic institutions to teach historical linguistics, there’s very little historical linguistics in the paper (hence, those 7 forms don’t form any kind of system, there’s no morphological evidence and there are no regular sound correspondences beside banal formal and semantic “similarities” that were already noticed in the 19th century by scholars such as Vilhelm Thomsen). Linguistic material gets persistently stripped down to a bare minimum and a machine of persuasion in the form of an objective statistical procedure is then turned on in order to convince the world that this impoverished, sterilized and disconnected linguistic material is in fact speaking from the grave to tell the truth that the language families in question descend from a common ancestor. But if only 7/50 forms are offered as “proof” of the Indo-Uralic hypothesis (down from thousands that define the historical reality of the Indo-European and Uralic families), what’s left out there to prove the Nostratic, Eurasiatic and the rest of the Tower of Babel levels of linguistic kinship? But enough of my enlightened mockery of the feeble efforts by a group of young Nostratic enthusiasts to prove their own worth in the eyes of the world’s linguistic community. Let’s hear what Ringe has to say.

1. Kassian et al. confuse similarity with regular sound correspondences: “[T]hey actually privilege words that resemble one another by chance over historically real cognate sets, because the sounds of the former are phonetically similar by definition, whereas regular sound correspondences exist between phonetically similar sounds and between sounds which are no longer similar.” (p. 348).

2. Kassian et al. are attempting to replace comparative methodology with a pseudoscientific construct that groups chance resemblances into “classes” (the 13 consonant clusters based on similar places of articulation) and then pronounces them to be a sign of common descent between languages in question: “[N]o one has demonstrated that D-classes are realistic enough units to be used as a basis for proof of non-obvious linguistic relationships. For that reason alone the authors’ method does not, in fact, come closest to modelling traditional comparative linguistics” (p. 349). (Another reviewer, Brett Kessler, on p. 363, suspects a bias in Dolgopolsky’s original formulation of the 13 classes geared to prove the reality of the hypothetical macrofamily that he was passionate about.)

3. Unlike a similar, 40-stable-item wordlist devised by Søren Wichmann on the basis of a broad sample of languages (see Holman, Eric W., Søren Wichmann, Cecil H. Brown, Viveka Velupillai, André Müller, Pamela Brown, and Dik Bakker. 2008. Explorations in automated language comparison. Folia Linguistica 42: 331-354), Nostraticists’ 50-item wordlist (different from Wichmann’s list by a whopping 10 items) is based on thin air: “The authors must have used a different sample of languages to construct their list, and it cannot possibly have been a larger sample than Wichmann’s; but without knowing what languages were used we have no way of knowing whether the list could conceivably be biased in favor of IU” (pp. 349-50).

4. Even the poor 7/50 words adduced by Kassian et al. have been doctored to look similar and to fit their pseudolinguistic “classified chance resemblances means common descent” method: “It is difficult to avoid concluding that the data have been both cherry-picked and massaged.” (p. 352).

5. 3 out of 7/50 word-pairs adduced to prove the Indo-Uralic hypothesis are actually monoconsonantal roots that don’t fit Kassian et al.’s own requirement of allowing only bi-consonantal roots in their permutation tests. The remaining 4/50 items actually show high probability of being chance resemblances by Kassian et al.’s own statistics: “Thus those three items do not really meet the authors’ two-consonant identity criterion. If we exclude them, only four remain, and as the authors show in Fig. 4 and Fig. 5, four or more such pairs appear in either 32% or 18% of randomized comparisons, again depending on which version of D-classes one uses. Such a result is comparable to results of earlier studies and actually suggests that the earlier studies were right: lexical resemblances between IE and Uralic are enough greater than average to create reasonable suspicions of genetic relationship, but not great enough for statistical proof” (p. 353).

6. Despite the arrival of a new generation of Nostratic enthusiasts armed with fancier persuasion tools, the evidential basis for the Indo-Uralic hypothesis is still the same as it was almost two centuries ago: “A reasonable assessment of the authors’ research is that, though their method might in principle be better than those of their predecessors (leaving aside the problems with the data, see above), they have effectively obtained comparable results” (p. 353).

I admire Ringe’s ability to respond to pseudoscientific nonsense in a poised and respectful manner and to grant another Nostratic comedy of errors a scientific analysis. (At one point he even generously calls Kassian et al. “competent” and hence entitled to their own judgment.) I smile at Ringe’s theatrical disappointment that even the best-of-class statistical procedure has failed to generate proof of the Indo-Uralic relationship and that, considering how advanced our maths is, no new test can possibly be invented to prove the coveted link. And I was rather intrigued by Ringe’s secret belief in the common descent of Indo-European and Uralic languages (“I continue to suspect that there really is a genetic relationship between Indo-European and Uralic,” pp. 355-6).

How can a scholar treat seriously what his own analytical rigor dissects as sheer nonsense? Why does he believe that, if done right, objective tests show that the evidence for the Indo-Uralic linguistic kinship is too sparse to indicate kinship, while at the same time suspect that Indo-European and Uralic are related? Let’s try to answer these questions.

Ringe and the other reviewers admit that IE *wed– ~ Uralic *weti– ‘water’ and IE *nomn– ~ Uralic *nimi– are the most plausible IU cognates if, as Ringe specifies, there really is a genetic relationship. This means that if the two language families are somehow shown to be related, the similarity between those two sets of forms will automatically become a valid indicator that the two families are related. Ringe either generously “hands” those two forms to the proponents of Indo-Uralic genetic unity (who spent so much statistically defending what cannot be shown linguistically), or he employs circular logic or he is in mild collusion with the authors. He supports Kassian et al.’s choice of the Uralic phylogeny, namely of its mainstream version, not of more debatable versions advocated for by such scholars as Tapani Salminen. But Salminen specifically tackles the problem of Uralic *weti– ‘water’, which shows no reflexes in Saami or Khanty, and argues that this form must have been borrowed from Indo-European. If one follows Salminen, the IE and Uralic forms for ‘water’ are indeed related, but not through common descent but through diffusion. Neither Kassian et al., nor Ringe wrote up any response to Salminen.

Ringe would like to believe that comparative method is good for something. Hence, he wants the ‘name’ and ‘water’ forms to be related. But it’s also clear that Ringe and other reviewers are not looking for mathematical proof; they are looking for traditional linguistic evidence, the one that entails sound correspondences and reveals hidden connections between dissimilar forms. And this evidence has not been provided – neither by him, nor by Illic-Svitych, Dybo, Dolgopolsky, nor anybody else. But judging by such suggestive forms as IE *wed– and Uralic *weti– ‘water’ this evidence MUST be there. It’s just comparative method is too weak to furnish it beyond what time has generously preserved for posterity.

The holy grail, it turns out, is not in improving the math but in improving the comparative method. And the tough question that must be asked is why do we think that the proto-Indo-European form for ‘water’ must be reconstructed as *wed-? The conventional cognate set that supports this reconstruction includes such forms as Hitt watar, Gen. wedenas ‘water’, Skrt uda ‘waters’, Slav *woda ‘water’, Germ. *wato:r ‘water’, Arm get ‘river’. Outside of this group of clear cognates, there are forms that must belong here as well but that show formal and semantic aberrations. Arm get means ‘river’; Lat unda means ‘wave’ and not ‘water’; this form and Lith vanduo ‘water’ show an intrusive –n-; Toch war, OIr usce (presumably from *udskio) and Alb uje (< *udriye) lost the middle consonant, while Gk hydo:r, Gen. hydatos ‘water’ demonstrates initial h– (typically from *s).

It appears therefore that even within a well-proven language family formal similarity is often sufficient to establish cognation and even to reconstruct a protoform that would then be used to propose an even deeper genetic link. An Indo-Europeanist’s solution to the unexpected formal variability of the ‘water’ words outside of Hittite, Sanskrit, Armenian, Slavic and Germanic is a declaration that it represents later developments within individual branches. But the phonomorphological conditioning of all these “later” changes has never been fully established, or, in other words, all those consonant deletions and insertions are currently irregular. Ringe is ready to accept the cognation of IE *wed– and Uralic *weti– despite the fact that the reconstruction *wed– is directly and unambiguously supported by the data from just 5 branches of Indo-European. Those branches do furnish forms that show strict formal agreement (unlike the set of divergent forms each of which is different in its own way). But the multiple deviations from them still beg for an explanation. And since, for the past 200 years, all of those irregular reflexes of PIE *wed– ‘water’ have been considered “related” to the regular ones without much progress in explaining the observable pathways of divergence, it’s unlikely that traditional comparative method will ever lead to the clarification of those aberrations. IE *wed– ‘water’ also lacks an etymology. Connecting it to Uralic *weti– ‘water’ leads to a suspicious conclusion that for thousands of years there was no semantic evolution associated with this form.

As I have been arguing for many years in articles and blog posts, the persistent phonological problems found in the Indo-European comparanda and the paucity of solid, persuasive and typologically verified etymologies has, at its root, naive approaches professed by the generations of historical linguists to constructing the cognate sets from which sound laws and proto-linguistic reconstructions are subsequently deduced. Consequently, the sound laws and the reconstructions come out incomplete, chronologically misattributed or flat-out wrong. The so-called “centum-satem” division is one of the unfortunate outcomes of this pervasive cognate set misanalysis. The so-called satem /s/ (< *k‘) and IE *s in reality were the same PIE phoneme showing up as s or k depending on the phonetic environment in all Indo-European branches. The tripartite split of ancient labiovelar phonemes into velar, dental and labial reflexes (depending on the following vowel), which is fully described for Greek (e.g., kyklos, telos and polos are all derived from *kwel-) and has been known to have affected other “centum” languages such as Italic, Celtic and Germanic, was in fact a proto-Indo-European phenomenon because labial and dental reflexes are found next to velar reflexes in the so-called “satem” languages as well.

The reformulation of the Split Labiovelar sound law as proto-Indo-European in age means that many forms reconstructed with /bh/, /b/, /p/ should in fact be reconstructed with /gwh/, /gw/ and /kw/. This applies to the ‘water’ forms in question. Notably, the Celtic root for ‘water’ is *dubro (OIr dobur, Bret dour, Welsh dw(f)r), which forms an isogloss with Slav *duno (< *dubno) ‘bottom’, *debri ‘deep woods’, Lith dugnas ‘bottom’, dubus ‘deep’, dubti ‘plunge, sink’, Latv dubens, dibens, dibins ‘bottom, deep’, dubra ‘puddle’, OPruss padaubis ‘valley’, Goth diups ‘deep’. Lith dugnas ‘bottom’ clearly points to a (labio)velar, which allows one to include the enigmatic Gk hugros ‘wet, moist’ into this set. The morphology (the distinctive heteroclitic –r-/-n– ending marking strong vs. weak cases) is the same in IE *weder-/*weden– and in Celtic *dubro/Balto-Slavic *dugwno-).

One question looms large: what happened in the anlaut and why is there an alternation between *øw– (*wed-/*ud-) and *dw– (*dwegw-/*dugw-)? First of all, Toch war ‘water’ is fully compatible with *dw– as Tocharian regularly loses d– before –w– (Toch A wu, we, Toch B wi– ‘two’ < IE *dwou-/*dwi-) but so do other IE languages in the case of Lat viginti ‘twenty’ (Toch wiki) (< *dwi-). Second, a similar process affected the IE cognate set TONGUE that shows forms such as Slav *jenzyku next to Gothic tuggo and Lat lingua (< Old Lat *dingua), TEAR that shows forms such as Skrt asru next to Gk dakruma and Goth trahan, and LONG that has forms such as Lith ilgas next to forms such as Slav *dolgu and Gk dolikhos. (Slav *velik- ‘great’, Toch walke ‘for a long time’ should be included here as well.)

It’s hard to say for sure whether d– was there from the beginning, or it’s a later phonetic (akin to Gk hardening of IE *y– to *dz– or Arm g < IE *w as in get ‘water’) or a morphological process but it looks like the pattern observed in the cognate sets TEAR, TONGUE and LONG is strong enough to suspect that the same phonetic development affected the cognate set WATER-DEEP. Considering that the d-forms are associated with a more basic meaning of ‘deep’ (from which ‘water’, ‘river’ and ‘wave’ must have evolved), it appears that PIE had a form *dwegw-/*dugw– and not *wed-.

The analysis above will surely raise questions and eyebrows because it’s a one of a kind, innovative and disruptive proposition. And in this particular instance I may end up being wrong. But it’s only this kind of analysis that would allow comparative method, in the absence of ancient lexical attestations, to penetrate beyond the formal resemblances to the underlying sound correspondences.

Although I’m no expert in Uralic languages, the PIE root *dwegw-/*dugw– looks potentially cognate with Uralic *juka ‘river, stream’. Reflexes of *juka are found in each and every Uralic language (Finn joki, Enets daha, etc), including Saami and Khanty, so it does not have the distributional difficulty of Uralic *weti– (reflexes missing in Saami and Khanty) and does not depend on the kind of phylogeny one assumes for Uralic languages. If Uralic *juka and IE *dwegw-/*dugw– are indeed related, then it’s possible that the d-/0– alternation in IE forms comes from the Indo-Uralic glide –y-.

Under this scenario (pending further research, let’s consider it a “thought experiment” for a moment), if *juka is clearly Uralic in origin, *weti– must be an early borrowing from an Indo-European language, just like Salminen suspects. If Indo-European and Uralic are indeed related they are related for a different reason than Ringer or Kassian et al. hope they are. Statistical tests correctly show the sparseness of the current corpus of Indo-Uralic comparanda simply because the underlying comparanda is wrong. Comparative method as developed by 19th century Indo-Europeanists gives a systematic error in proto-linguistic reconstructions. Ringe is right about the limits of statistics, but he is wrong in assuming that the comparative method has no room to evolve. It’s the assumption that comparative method has been fully perfected that makes him want Kassian et al. to be right. In reality, a breakthrough in Nostratic, Indo-Uralic and other megaphyla research can come through the improvement of the comparative method as applied to the primary material from first-order language families, not through the improvement of statistical methods applied to misanalzyed cognate sets and misconstrued protoforms. Then the shared descent between first-order families and the exact phylogeny of Eurasian languages will become more transparent.