The Indo-European homeland – ancient DNA (part 1)

Mikkel Nørtoft

In this post, we will look into some new findings from the field of ancient DNA which are relevant to the question of the Indo-European homeland. In this first part, we will mostly focus on the genetic aspect, and in the second part (soon to come), we will look at how this can best be correlated with linguistic arguments.

The field of ancient genomics has contributed greatly to the discussion of prehistoric migrations and with that also the discussion of the Indo-European homeland.

Presently, it seems that western Eurasia displays a small number of basal ancestral groups from the earliest sampled individuals going back to the Palaeolithic (colours match the colours on the Homeland Timeline Map and will be used throughout the blog post for readers to better follow which group I am referring to):

Eastern Hunter-Gatherers (EHG) found in most of Russia and Eastern Europe (including the Pontic-Caspian steppes).
Western Hunter-Gatherers (WHG) found all over the European peninsula going back to the Paleolithic.
Caucasian Hunter-Gatherers (CHG) and Mesolithic and Neolithic Iranians which seem to be closely related.
Anatolian Farmers responsible for the spread of agriculture in most of Europe.
Levant Farmers going back to the Natufians who are known as the earliest farmers in the Levant.
The colour RED will be used here for the Pontic-Caspian “Steppe profile” which is a mix of CHG and EHG ancestry. Archaeological cultures related to this Steppe-profile will also be shown with the color RED:

Approximate distribution of basal ancestry groups in Western Eurasia about 10,000 years ago and the later Steppe profile added in red (by Mikkel Nørtoft).

Migrations from the homeland
In 2015, two large (and competing) studies[1] appeared with improved methods looking at the whole genome instead of the earlier methods only using the Y-chromosome (inherited from the father, only in males) or mitochondrial genome (inherited from the mother). Together, they had sampled more than a hundred ancient humans from various periods and regions in western Eurasia.

They reached the same conclusion using two different methods of analysing ancient DNA: the individuals of the archaeological culture termed Yamnaya (or “Pit Grave”) (c. 3300-2800 BCE), and the preceding Khvalynsk culture (5th millennium BCE), both living in the Pontic-Caspian steppes, were genetically closely related to the widespread European Corded Ware culture complex (c. 2900-2200 BCE). They went as far as suggesting a “mass migration” of Yamnaya herders into Europe around 3000-2500 BCE. The Yamnaya genomes were also very close to those of the Afanasievo culture appearing in the Altai region of Siberia around 3300-3100 BCE. The male Y-chromosome haplogroups R1a and R1b spread together with these steppe-populations into Europe and Asia and are still today very frequent in most of Europe. It thus seemed that an exodus of herders moving both east and west from the Pontic-Caspian steppes had now been found in the DNA.

The genetic make-up of Yamnaya
individuals in the Pontic-Caspian steppes also derived about half of their ancestry from earlier local Eastern Hunter-Gatherers (EHG), and about half from the Caucasus region (Caucasian Hunter-Gatherers). This Caucasian ancestry is about 25% in earlier steppe Khvalynsk individuals (5th millennium BCE).[2]

This seems to fit archaeological studies that show movement of material culture from the North Caucasian Maykop and Novosvobodnaya cultures into the steppes during the 4th millennium BCE.[3] The cultural-material evidence of influence from the Caucasus before the 4th millennium BCE is more subtle. It has now been confirmed that the two ancestry groups do not seem to share deep genetic ancestry. Only from the Yamnaya period (3rd millennium BCE), we see the significant steppe ancestry moving south into the Caucasus region.[4]

However, there are a few so-called “outlier” individuals on both sides of this frontier who belong to the other genetic group[5]. This scenario fits the model of cultural contacts between the two groups through small scale movement. Y-chromosome haplogroup J2 (carried by males) is very common in the Caucasian group. Since this haplogroup only rarely shows up in steppe-related individuals, also after the spread of steppe groups, it could indicate that mostly females “switched sides” in a system of patrilocal intermarriage (females move to the husband’s family), perhaps through marriage alliances, or at least that Caucasian male lineages were not very succesful in the steppes. We also see these moving “female brides” in Europe with the arrival of (mostly male) Yamnaya populations forming the Corded Ware phenomenon. This model is further supported by the Caucasian ancestry in the steppes only increasing quite slowly, first to about 25% in the Khvalynsk period (5th millennium BCE), and then to about 50% in the Yamnaya period (late 4th millennium BCE).

Additionally, it should be noted that some minor ancestry in Yamnaya individuals from European Farmers and Western Hunter-Gatherers has also been found recently. This supports archaeological contacts between steppe societies and European farming societies like the Cucuteni-Tripolye town dwellers in Ukraine and Bulgaria and/or the East European Globular Amphora culture where this type of ancestry has been found mixed with European Western-Hunter Gatherer ancestry[4].

In the next post, we will look into the implications of these genetic findings on the question of the Indo-European homeland.


The Indo-European homeland problem (introduction)

Mikkel Nørtoft

The question of where and when Proto-Indo-European was spoken is often termed “the Indo-European homeland problem” and is closely related to the important follow-up questions “how and when did the languages spread”. If the members of this one language family have been so successful compared to other language families, and were already very widespread (from Tocharian in Western China and Sanskrit in India to Old Irish in Ireland, and Old Icelandic in Iceland) when they first appeared in written form, we would expect their initial spread to be somehow visible in the archaeological remains of prehistoric cultures. But of course spoken language normally flows elusively through the air and is not materialised unless written down, so the language spread could, in theory, be completely invisible, and determining a homeland would be close to impossible.

Nevertheless, many hypotheses about the Indo-European homeland have been put forward, but they were often “polluted” by nationalism (mostly placing the homeland in whichever writing scholar’s own country). The concept of Indo-Europeans was even taken hostage in the agenda of the Nazi regime, referring to them as “Aryans”. However, the term Aryan is actually a religious, cultural, and linguistic, but not racial(!), self-designation found in early Indic (Vedic) texts. This “hijack” by the Nazis has unfortunately tainted the topic of Indo-European studies to the public ever since. This does not, however, make the original homeland question any less relevant.

The two most persistent hypotheses for an Indo-European homeland are:

(1) The “Anatolian hypothesis”: the early speakers of Indo-European spread into Europe and central and southern Asia with the farming revolution from Anatolia (Turkey) around 6700 BCE

Unknown source (copied from Anne Wilcox at

(2) The “Pontic-Caspian steppe hypothesis”: the early speakers of Indo-European spread with herders from the grassland steppes north of, and between, the Black (“Pontic”) Sea and the Caspian Sea around 4000–3000 BCE.

The “Anatolian hypothesis” has been dominant in most of western archaeology by Colin Renfrew, and the “Steppe hypothesis” has been dominant in most historical linguistic circles and in East European archaeology by Marija Gimbutas, and later elaborated and improved by James Mallory and David Anthony[1].

Mallory and Adams[2] have defined a set of general principles that work as argumentative tools using both linguistics and archaeology when trying to locate the homeland. I’ll mention two of the principles:

  • The technological principle: What technological vocabulary can be securely reconstructed for PIE, and how does that match the material culture we are looking for archaeologically? Some important examples in PIE are terminology for wheels and wagons (including ‘axle’), weaving, wool, and dairy product terminology. Therefore we must expect the Indo-Europeans to have had these things at the time of PIE, or at least “Core PIE” (after the Anatolian branch split off). Interestingly, we have difficulty securely reconstructing agricultural terminology for PIE, which suggests that it was not very important in their society. Generally, it has been argued that the PIE vocabulary suggests that we should be looking for a patriarchal herding culture. If we look at the archaeological evidence for these things we find evidence of a herding culture in the Pontic-Caspian Steppes (many domestic animal bones from c. 5200 BCE onwards)[3], evidence for wheels from around 3500 BCE[4], evidence for the use of milk in pots from the mid 4th millennium BCE[5], and evidence for weaving (woven reed mats on ceramic imprints) from Khvalynsk on the Volga, probably around 4000 BCE[6]. However, no evidence of wool has been found in the steppes until the Catacomb period c. 2500 BCE;[7] but since wool is very rarely preserved anywhere, this is not a strong argument against the presence of wool (more on this in a later blog post). The best indication of wool in the steppes is a few wool textiles in Novosvobodnaya in the North Caucasus (neighbouring the steppes) C14-dated to 2893–2679 BCE[8].
  • The relational principle: Loanwords are often exchanged between related and unrelated languages. When we see this in prehistory, it is a strong indication of direct contact between languages. Knowing the chains of sound laws, different chronological layers of loanword exchange (and then contact) can also be established to some degree. We can even see that several loanwords were exchanged between some North Caucasian languages (still spoken in the Caucasus region) and PIE before the split. A few early loans from an ancestor of the Semitic language family (Hebrew, Arabic, etc.), termed “Afro-Asiatic”, have also been proposed. One example is a word for ‘bull’, *tauros.[9]

Perhaps most important for the homeland question are the PIE loanwords borrowed into the common ancestor of the Uralic language family, which includes Finnish, Saami, Hungarian, and many minority languages in Russia. The Uralic “homeland” (equally debated) is by many scholars suggested to have been in the forest zone west of the Ural Mountains at the Volga–Oka (and perhaps also Kama) rivers, perhaps in the Volosovo culture (c. 3650–1900 BCE).[10]
These loanwords place PIE somewhere between the Uralic homeland (forests of Volga-Oka-(Kama)) and the Caucasus. Furthermore, many also argue for an even earlier unity between Uralic and Indo-European, termed Indo-Uralic. With all this in mind, a homeland in Anatolia would make it very difficult to explain the possible deep relationship between PIE and Proto-Uralic.

The lack of good reconstructions for agricultural words, especially for pulses (including lentils, beans, chickpeas and peas) speaks for the “Steppe hypothesis”. They are not found archaeologically in the steppe region in the relevant period, but they are found as domesticates with the early farmers of Anatolia from around 7000 BCE, and have been widespread in Europe ever since they arrived with early farmers from Anatolia. Therefore, it is unlikely that the Anatolian farmers spoke PIE, and it is yet another argument against the Anatolian hypothesis.[11]

In an upcoming post, we will look at the very new field of ancient DNA and its implications for the question of the Indo-European homeland.


Where do the Indo-European languages come from?

Thomas Olander

Around half the world’s population today speaks a language belonging to the Indo-European language family. The Indo-European language family includes most languages spoken in an area covering Europe (important exceptions here being Basque, Finnish and Hungarian), Iran, Afghanistan, Pakistan and northern India (see fig. 1). Some of the most widely spoken Indo-European languages are English, German, Spanish, Portuguese, French, Russian, Hindi, Bengali and Punjabi.

Fig. 1. Present-day distribution of Indo-European languages. Orange: countries with a majority of speakers of IE languages. Yellow: countries with an IE minority language. (Brianski [Public domain], from Wikimedia Commons)
The Indo-European languages seem to be newcomers in most of Europe and East Asia. But where do they come from? That is a good question that may be answered in several different ways. As an introduction to this blog, I will present a short answer in this post.

A simple answer to the question of the origin of the Indo-European languages is “Africa”. It is likely that the first anatomically modern humans, who lived in Africa more than 200,000 years ago, spoke with each other in the same manner as humans now speak with each other, and quite possibly the languages we now speak descend from this speech of the first humans. Thus, in this sense, all languages, including the Indo-European languages, originate in Africa. However, apart from assuming that it was probably functionally similar to modern language, we do not know much about the language of the first humans. Too much time has passed since then for the methods of historical linguistics to be able to posit any specific hypotheses about it.

An alternative and, in my opinion, more interesting way to answer the question of the origin of the Indo-European languages is to investigate where, and when, the ancestor of the Indo-European languages was spoken. To illustrate this approach we may take a modern Indo-European language – English, for instance – and trace its development back in time through history, first Middle English and then Old English. By comparing the oldest documented stages of English with those of the other Germanic languages – such as German, Dutch, the Nordic languages and the extinct Gothic language – we arrive at Proto-Germanic, the ancestor of all the Germanic languages. Proto-Germanic is usually estimated to have been spoken around the beginning of our era.

We don’t have to stop there, though. By comparing Germanic with the other subgroups of the Indo-European language family, historical linguists are able to reconstruct large parts of the sound, grammar and vocabulary of the ancestor of all Indo-European languages: Proto-Indo-European.

So all Indo-European languages descend from a hypothetical proto-language, Proto-Indo-European. But where was Proto-Indo-European spoken?

Put this way, the answer must be found in a collaboration between linguistics and archaeology. Ever since it was discovered, two centuries ago, that the Indo-European languages are related and descend from a common ancestor, scholars and lay people have discussed where on earth the “Indo-European homeland” was located. The guesses are numerous and of varying quality. Fig. 2 is a heatmap showing some of the proposals of the location of the Indo-European homeland from 1813 to 2018.

Fig. 2. Some of the proposed locations of the Indo-European homeland. (Thomas Olander)

Today most historical linguists and most archaeologists interested in the problem are inclined to think that the most likely location of the Indo-European homeland is in the steppe north of the Black Sea and the Caspian Sea, in present-day Ukraine and south Russia – the Pontic–Caspian steppe.

There are several reasons why the “steppe hypothesis” is the most attractive one. Archaeological cultures from the steppe have spread westwards into Europe and eastwards towards India and western China in a period that fits our knowledge of the chronology of the Indo-European languages. Certain words that are reconstructible for early stages of Indo-European languages – primarily two words for ‘wheel’ and a word for ‘axle’ – indicate that the spread of the Indo-European languages cannot have taken place much earlier than the invention of the wheel around 4000–3500 BCE.

The structure of the relationship between the subgroups of the Indo-European language family – the Indo-European family tree – fits a a spread from the Pontic–Caspian steppe much better than the alternatives, most famously the Anatolian hypothesis, which locates the Indo-European homeland in central Turkey around 6500 BCE.

Until recently an important argument against the steppe hypothesis was that it was difficult to imagine how a language spoken by people on the Pontic–Caspian steppe could have spread as dramatically as the the spread of early Indo-European speech must have been. The most likely vector for the spread would have been movement of people; but archaeology doesn’t show unambiguously that such migrations had taken place.

In recent years, however, the steppe hypothesis has received support from a somewhat unexpected side: prehistoric genetics. Analyses of ancient DNA from skeletons found in Europe and Asia show that there were large-scale migrations of people – especially male individuals – from the steppe into Europe and towards Asia during the third millennium BCE. With the studies of ancient DNA, published by population geneticists from different research environments but pointing in the same direction, the main argument against the steppe hypothesis was dismantled.

The evidence thus seems to support an Indo-European homeland in the Ukrainian and south Russian steppe region north of the Black Sea and the Caspian Sea – but the question has not been definitively settled yet.