Monday, June 6, 2011

The Story of Our Origins | OPEN Magazine

The Story of Our Origins

DNA tests on a cross-section of Indians including John Abraham and Baichung Bhutia reveal surprising truths about our origins

His religion bears no correlation to his genetics (Photo: SUBISAMUEL)
His religion bears no correlation to his genetics (Photo: SUBI SAMUEL)

Just where did our ancestors come from? Indian diversity has long been reduced by many historians to a simple story of an invasion of Aryans pushing Dravidians further south in the Subcontinent. But an analysis of the genes that Indians bear throws up enough evidence to rubbish that theory, pointing instead to a far more complex set of migrations—and perhaps reverse migrations—many millennia earlier than commonly supposed.

To get a clearer picture of our origins, Open sent DNA samples of a couple of celebrities, John Abraham and Baichung Bhutia, alongwith those of four magazine staffers to the National Geographic Deep Ancestry Project. Based on the genetic markers thus identified and other research conducted by scientists, we present a plausible map of our origins. Be prepared for some surprises

Nehru, even in his romanticism, was only stating what every observer of India has always noticed—the tremendous diversity of people in India, not just in terms of customs and culture, but in religion, caste and appearance. The obvious question has always been: where does this diversity come from? Take, for example, caste: did the system evolve in India, or did it originate outside and become part of the country’s social structure? Were our different language groups, such as Dravidian and Indo-European, brought in by different sets of migrants? The questions are endless, and the answer to any one of them lies in the answer to the most basic question of all: where do we Indians come from? How was the Subcontinent settled?     

Attempts have been made to answer these questions with evidence drawn from fields as varied as linguistics and archaeology. Despite the inroads that have been made, the question has not even come close to being answered, and even the partial answers that have been on offer have been a source of contentious debate. For one, the Aryan Invasion theory—suggesting that an invasion of Indo-Europeans displaced the original Dravidian inhabitants of north India, which found favour at one time and was later rejected and denounced—addresses only a small part of the Subcontinent’s diversity as a theory.

But results from an entirely different area of human study suggest that there may be a satisfactory answer to the question, and it lies in our genes.

For each of us, our physical characteristics are encoded in the DNA that we carry within each cell of our body. A study of our DNA (see ‘The Science of DNA Testing’) allows us to trace our ancestry. In case of men (and for women by testing their brothers or father), we can trace our line of paternal  descent, our father’s father’s father’s… father, by studying the Y-chromosome; and in case of both men and women, we can trace our line of maternal descent, our mother’s mother’s mother’s… mother, by studying mitochondrial DNA. 

This field, now over two decades old, has slowly been refined to the point where events in our distant ancestry can now be studied. Not only are the new answers on offer fascinating, there is also the certainty that with each passing year, they will be refined, questioned and challenged to the point where we would be able to make definitive statements about our past. One such project is National Geographic’s Deep Ancestry that is compiling data from across the world on people who want to determine their distant ancestry.

We sent six samples, four men and two women, of people from various parts of India to the National Geographic Project (NGP), and, based on the results we have obtained (see the case studies listed in the right column), we have attempted to map out a representative history of what can be said today about the peopling of India. To do so, we have not only sought elaboration from Ramasamy Pitchappan, principal investigator, India, of the NGP, we have also spoken to a leading Indian geneticist, RNK Bamezai, director of the National Centre of Applied Human Genetics (NCAHG) at Jawaharlal Nehru University and vice-chancellor of Jammu University.

Of course, having collated all this research material and inputs, the final responsibility of the interpretations made rests with Open.


Sometime between 60,000 to 90,000 years ago, humans first moved out of Africa by crossing the Red Sea. This, in all likelihood, occurred during a glacial period when the earth was at its coldest, and falling sea levels would have shrunk the distance between Africa and Asia at its narrowest to barely 11 km. Crossing into Asia, surviving on a diet rich in shellfish, these early humans who left Africa stayed close to the coast as they made their way round to South Asia. 

The strongest evidence of this is offered by the study of mitochondrial DNA, which indicates the maternal line of descent (see DNA analysis of Sohini Chattopadhyay and Haima Deshpande of Open). All human beings outside Africa are descended from two female lines, termed Haplogroup M and N. It is unclear  whether the two female lines evolved while humans were still in Africa or shortly after, but the available evidence suggests both lines were present in that first migration from Africa to South Asia.


M – 60 per cent

N – 25 per cent

U*  – 15 per cent

*A sub branch of N that is found in larger numbers in the northwest of the country

The vast majority of the Indian population carries Sohini and Haima’s Haplogroup M, whose antiquity in India dates back at least 60,000 years ago, if not more. Since mitochondrial DNA is passed down in direct line of maternal descent, this suggests that the female population of India dates directly back to that first exodus of humans from Africa.

The N Haplogroup and its sub-haplogroup U are also found in India, but show up in high frequencies largely in the Northwest. Even these groups seem to be largely of great antiquity in the Subcontinent. There seems to have been very little migration of women into the Subcontinent after the first settlers arrived here. According to Bamezai, who advises caution in saying anything more than warranted by the data, this is not so surprising: “The mobility of males was much more—raiding parties or for that matter armies on the move even today are largely male.” 


The men who are believed to have migrated to India along with women as part of the first coastal migration from Africa are identified by the Haplogroup C. This marker is found in less than 5 per cent of the Indian population today. These migrants seem to have moved further along the coast, settling in East Asia and Australia. 


H – 30 per cent

R1a1— 20 per cent

R2a — 15 per cent

L – 10 per cent

O and related markers – 10 per cent

Others – 15 per cent

In rather broad terms, it is possible to make some generalisations. H is found in greater percentage among the Austro-Asiatic tribal population, L among the Dravidian language (such as Tamil and Telugu) speaking non-tribal population, R1a1 among speakers of the Indo-European languages (such as Hindi, Punjabi and Bengali). But there is no way on this basis to distinguish any individual from another. An individual with R1a1 could as well be a tribal as an Indo-European language speaker. Nor can discrete groupings be identified in any clear-cut way. The L marker could be found in the north of the country, and H could show up among some Brahmins.

What we do know for sure is that the earliest large-scale male settlers in the Subcontinent belong to the line defined by Haplogroup F and its branch Haplogroup H (see the DNA analysis of John Abraham). Both these haplogroups are found in significant percentages in the Indian tribal population, reaching a combined percentage of well over 30. The F Haplogroup dates back to at least 45,000 years in the Subcontinent. John’s H haplogroup, which is not found anywhere else in the world in any significant proportion and has hence been termed the ‘Indian marker’, has an antiquity in the Subcontinent of at least 25,000 years. Interestingly, though, it is found among Europe’s gypsies, indicating their Indian origin.

A related line descended from Haplogroup F, termed Haplogroup L (see the DNA analysis of Sharad Raghavan), is also found in significant numbers in South India, especially Tamil Nadu among the non-tribal population. Again, this is a haplogroup rarely found outside India and has an antiquity of around 25,000 years.

Two other significant haplogroups found in the Indian population are R1a1 (see the DNA analysis of Hartosh Singh Bal) and R2a, both found deep in the line of descent that goes back to Haplogroup F. Their antiquity in India dates back 15,000 to 20,000 years ago.

Hartosh’s R1a1 is found in higher proportions in the north of India and among upper-castes, reaching a proportion of nearly 50 per cent in Punjab and over 70 per cent in such caste groups as West Bengal Brahmins. But it is also found in the South and among the tribal population, reaching a proportion of well over 25 per cent among the Chenchu tribals of Andhra. R2a mirrors the distribution of R1a1, but it has a far more evenly spread across the geography of the Subcontinent and the hierarchy of castes; in some ways, it is a pan-Indian marker, a significant marker that has not shown up in the small sample sent by Open to the NGP.

There are also an assorted number of other markers, such as the D Haplogroup (see DNA analysis of Baichung Bhutia). This haplogroup is found in large numbers in East Asia and has likely reached Sikkim from Tibet. It is also found among some northeastern tribes that bear Haplogoup O as the other important marker.


The first male settlers of the Indian Subcontinent would have accompanied the women, whose descendants still inhabit the Subcontinent, on the first coastal migration from Africa. They are identified by the Haplogroup C marker, found in less than 5 per cent of the Indian population. According to the NGP, the presence of both John’s and Sharad’s haplogroups (H and L) in India can be explained by two separate migrations, one from the Middle East and the other from Central Asia, both dating back some 25,000 to 30,000 years ago. 

The NGP goes on to describe the first encounter between the men from the original settlement of India with those who arrived later. The genetic trail, the NGP states, ‘provides some tantalizing clues as to what may have happened when members of the Indian Clan and the [earlier settled] Coastal Clan met. The [mitochondrial DNA] of people in this region preserves evidence of the early coastal dwellers in the female lineage, but Y-chromosome frequency for the Coastal Clan is very weak—around 5 per cent in southern India, and even less frequent going farther north. These data suggest that the descendants of the Indian Clan may have mated with the women of the earlier coastal population, but that the coastal men were killed, driven off, or otherwise prevented from reproducing.’

Pitchappan elaborates, “Probably initial colonies consisting of males and females settled and expanded. In the later migrations, either the males were by themselves or they came accompanied by very few females. Local males could have resisted and could have been exterminated, while females may have been amalgamated.” He adds that other possibilities are also conceivable, such as matrilineal societies by which the incoming males could have been amalgamated: “There is some evidence to suggest that settlements in the Dravidian belt were female centric.” He points to the existence of matriarchal societies in the South, such as Kerala’s Nairs, as the survival of an older tradition. 

But stories such as this are speculative at best. In the Indian context, they are reminiscent of the possibilities once cited to describe the entry of Indo-Europeans into India, the so-called Aryan Invasion theory.

The evidence so far, however, seems to suggest that the presence of both John’s and Sharad’s haplogroups in India could be well explained by an earlier arrival of the super-ancestral F haplogroup in India. In fact, it is quite likely that either the F haplogroup arrived as part of the coastal migration along with the C haplogroup, to which it is very closely related, or it evolved here in males who were part of the earlier migration. If so, it would make sense that the antiquity of a great majority of the Indian male population also goes back to the out-of-Africa coastal migration.

In fact, much of the genetic evidence seems to suggest a South Asian origin for the F haplogroup. This haplogroup and its lines of descent account for perhaps 90 per cent of the male population in the world. Contrary to received wisdom, this would imply that much of the globe outside Africa was settled by outward migrations from South Asia dating back to over 50,000 years ago. Certainly, the distant origins of the modern European population seem to lie in South Asia, emphasising the crucial importance of this region in understanding the peopling of the globe.

But beyond such speculation, which will be settled as more and more data is gathered by projects such as the NGP, the one thing that can be said with a degree of certainty is that the antiquity of both the L and H haplogroups in India suggests that a majority of the Indian male population can trace its presence in the Subcontinent back at least 20,000 years if not earlier.


This brings us to perhaps the most contentious of markers, Hartosh’s R1a1. The NGP states: ‘Some linguists believe that the Kurgans, nomadic horsemen roaming the steppes of southern Russia and the Ukraine, were the first to speak and spread a Proto-Indo-European language, some 5,000 to 10,000 years ago. Genetic data and the distribution of Indo-European speakers suggest the Kurgans … may have been descendents of M17 (the genetic marker that identifies the R1a1 haplogroup). Today a large concentration—around 40 per cent—of the men living from the Czech Republic across the steppes to Siberia, and south throughout Central Asia are descendants of this clan. In India, around 35 per cent of the men in Hindi-speaking populations carry the M17 marker, whereas the frequency in neighboring communities of Dravidian speakers is only about ten percent. This distribution adds weight to linguistic and archaeological evidence suggesting that a large migration from the Asian steppes into India occurred within the last 10,000 years.’

This NGP claim goes far beyond what the genetic data warrants. Says Bamezai, after looking through the NGP results published in this article, “For me as a scientist, it is necessary to be very conservative in my claims. Any broad conclusions require much more work and detailed study of not just haplogroups, but sub-haplogroups. I think the migration paths described in these cases are in question. I feel R1a1 originated here and contributed to Central Asia rather than the other way around.”

A key 2009 paper published in the Journal of Human Genetics by Bamezai and his colleagues at JNU argues this point further: ‘Many major rival models of the origin of the Hindu caste system co-exist despite extensive studies, each with associated genetic evidences. One of the major factors that has still kept the origin of the Indian caste system obscure is the unresolved question of the origin of Y-haplogroup R1a1, at times associated with a male-mediated major genetic influx from Central Asia or Eurasia, which has contributed to the higher castes in India. Y-haplogroup R1a1 has a widespread distribution and high frequency across Eurasia, Central Asia and the Indian subcontinent... To resolve these issues, we screened 621 Y-chromosomes (of Brahmins occupying the upper-most caste position and schedule castes/tribals occupying the lower-most positions)... for conclusions. A peculiar observation of the highest frequency (up to 72.22%) of Y-haplogroup R1a1 in Brahmins hinted at its presence as a founder lineage for this caste group. Further, observation of R1a1 in different tribal population groups, existence of Y-haplogroup R1a in ancestors, and extended phylogenetic analyses of the pooled dataset of 530 Indians, 224 Pakistanis and 276 Central Asians and Eurasians bearing the R1a1 haplogroup supported the autochthonous [indigenous] origin of R1a1 lineage in India and a tribal link to Indian Brahmins.’

The conclusions bear restatement. The first thing that the evidence suggests is that the origins of Hartosh’s R1a1 haplogroup lie in India. Thus, a large part of Central Asia, Southern Russia, Ukraine onwards to the Czech Republic may well be populated by a 15,000-year-old migration from India. Given the timeframe of the origins of the R1a1 haplogroup in India, it is important to note that this does not rule out a subsequent re-entry of people from Central Asia bearing this marker into India at a much later date. As further sub-lineages of Hartosh’s R1a1 are studied, it may well be possible to answer even this question.

The second part of their conclusions rests on the fact that the proportion of R1a1 in some Brahmin groups such as those of West Bengal is as high as 72 per cent. This indicates that the origins of Brahmins as a caste may well lie in the R1a1 haplogroup. But since the antiquity of the Ra1a haplogroup in tribals such as Central India’s Sahariyas is older than it is among Brahmins, it is reasonable to believe that Brahmins may not be entrants from outside but may have originated as a caste from the tribal population of this country.

It is a strong claim, one that hints at possible discoveries that may lie ahead as the genetics of the Indian population is studied in greater detail. The one conclusion, though, that is unlikely to change is the one Bamezai emphasises over and over: “Groups we seem to see as distinct have overlapping genetic signatures. In fact, two castes that may have great hostility towards each other may carry the same signatures. Caste, tribe and religion in India do not have any genetic basis.” Trite as it may sound, the conclusion is inescapable, there is unity in this diversity.