Around half the world’s population today speaks a language belonging to the Indo-European language family. The Indo-European language family includes most languages spoken in an area covering Europe (important exceptions here being Basque, Finnish and Hungarian), Iran, Afghanistan, Pakistan and northern India (see fig. 1). Some of the most widely spoken Indo-European languages are English, German, Spanish, Portuguese, French, Russian, Hindi, Bengali and Punjabi.
The Indo-European languages seem to be newcomers in most of Europe and East Asia. But where do they come from? That is a good question that may be answered in several different ways. As an introduction to this blog, I will present a short answer in this post.
A simple answer to the question of the origin of the Indo-European languages is “Africa”. It is likely that the first anatomically modern humans, who lived in Africa more than 200,000 years ago, spoke with each other in the same manner as humans now speak with each other, and quite possibly the languages we now speak descend from this speech of the first humans. Thus, in this sense, all languages, including the Indo-European languages, originate in Africa. However, apart from assuming that it was probably functionally similar to modern language, we do not know much about the language of the first humans. Too much time has passed since then for the methods of historical linguistics to be able to posit any specific hypotheses about it.
An alternative and, in my opinion, more interesting way to answer the question of the origin of the Indo-European languages is to investigate where, and when, the ancestor of the Indo-European languages was spoken. To illustrate this approach we may take a modern Indo-European language – English, for instance – and trace its development back in time through history, first Middle English and then Old English. By comparing the oldest documented stages of English with those of the other Germanic languages – such as German, Dutch, the Nordic languages and the extinct Gothic language – we arrive at Proto-Germanic, the ancestor of all the Germanic languages. Proto-Germanic is usually estimated to have been spoken around the beginning of our era.
We don’t have to stop there, though. By comparing Germanic with the other subgroups of the Indo-European language family, historical linguists are able to reconstruct large parts of the sound, grammar and vocabulary of the ancestor of all Indo-European languages: Proto-Indo-European.
So all Indo-European languages descend from a hypothetical proto-language, Proto-Indo-European. But where was Proto-Indo-European spoken?
Put this way, the answer must be found in a collaboration between linguistics and archaeology. Ever since it was discovered, two centuries ago, that the Indo-European languages are related and descend from a common ancestor, scholars and lay people have discussed where on earth the “Indo-European homeland” was located. The guesses are numerous and of varying quality. Fig. 2 is a heatmap showing some of the proposals of the location of the Indo-European homeland from 1813 to 2018.
Today most historical linguists and most archaeologists interested in the problem are inclined to think that the most likely location of the Indo-European homeland is in the steppe north of the Black Sea and the Caspian Sea, in present-day Ukraine and south Russia – the Pontic–Caspian steppe.
There are several reasons why the “steppe hypothesis” is the most attractive one. Archaeological cultures from the steppe have spread westwards into Europe and eastwards towards India and western China in a period that fits our knowledge of the chronology of the Indo-European languages. Certain words that are reconstructible for early stages of Indo-European languages – primarily two words for ‘wheel’ and a word for ‘axle’ – indicate that the spread of the Indo-European languages cannot have taken place much earlier than the invention of the wheel around 4000–3500 BCE.
The structure of the relationship between the subgroups of the Indo-European language family – the Indo-European family tree – fits a a spread from the Pontic–Caspian steppe much better than the alternatives, most famously the Anatolian hypothesis, which locates the Indo-European homeland in central Turkey around 6500 BCE.
Until recently an important argument against the steppe hypothesis was that it was difficult to imagine how a language spoken by people on the Pontic–Caspian steppe could have spread as dramatically as the the spread of early Indo-European speech must have been. The most likely vector for the spread would have been movement of people; but archaeology doesn’t show unambiguously that such migrations had taken place.
In recent years, however, the steppe hypothesis has received support from a somewhat unexpected side: prehistoric genetics. Analyses of ancient DNA from skeletons found in Europe and Asia show that there were large-scale migrations of people – especially male individuals – from the steppe into Europe and towards Asia during the third millennium BCE. With the studies of ancient DNA, published by population geneticists from different research environments but pointing in the same direction, the main argument against the steppe hypothesis was dismantled.
The evidence thus seems to support an Indo-European homeland in the Ukrainian and south Russian steppe region north of the Black Sea and the Caspian Sea – but the question has not been definitively settled yet.