Sunday, February 17, 2008

Absurdities of Caste Genetic Studeis - Notes

Sengupta et al. 2006 (or 2005, Y-chromosomal variation in India) makes the following statement

Eight HGs display frequencies 15% within India and account for 95.8% of the samples.
They are, in descending frequency order, HGs H and its subclades H1*, H1c, H1a, and H2 (26.4%); R1a1-M17 (15.8%); O2a-M95 (14.6%); R2-M124 (9.3%); J2-M172 (9.1%); O3e-M134 (8.0%); L1-M76 (6.3%); and F*-M89 (5.2%) (p. 5 of 20).


The statement basically talks about haplogroup frequencies in their samples which of course is not representative of percentage distribution of various groups in India. Unfortunately, I have seen many people making the mistake of considering the frequencies as a representation of haplogroup distribution within India.

For example O2a forms about 15% of Sengputa et al. samples. But this haplogroup is mostly restricted to Munda tribes. Tribes make up 8% of Indian population and it includes Dravidian, Munda, Indo-European and Sino-Tibetan. Mundas probably form 1 or 2% of total Indian population. Hence, the total frequency of O2a can be around 1-2% (or less) of total Indian male population.

7 comments:

Anonymous said...

Very good point Manju, taking the sample to represent the Indian population is like saying 50% of the Indian population are tribes and Dravidian and IE castes are 25% a piece.

Anonymous said...

One way to estimated the distribution of haplogroup frequencies in the Indian population is to stratify the sub populations into IE.caste, D.caste and tribes then weight the populations appropriately (0.60, 0.3, 0.1) (?) or just treat the tribes as outliers and ignore them altogethers.

Manju Edangam said...

Indeed, Ibra. I have come across couple of instances on the web where people got misled by these numbers.

But I suppose stratification is more complicated than that. We need castewise enumeration in a region and that would help in generalization of haplogroup frequencies in India.

Anonymous said...

"Indeed, Ibra. I have come across couple of instances on the web where people got misled by these numbers."

Haha I think I know what you mean on Mr.Dieneks famous blog right? :)

Anyway I tried weighting the castes only (0.74,0.26) and...

P-Haplogroups: 51.806


H-Haplogroups: 24.074


L-Haplogroup: 6.188


J2-Haplogroup: 7.44


F*-Haplogroup: 3.454


K2-Haplogroup: 2.728


K*-Haplogroup: 3.34


C-Haplogroup: 0.918

Data from table 2b R.Trivedi et.al

It's over simplistic I guess but a lot better than pooling everything!

Anonymous said...

Here is that excel file:

http://www.sendspace.com/file/or5crt

Manju Edangam said...

I am not very enthusiastic about Trivedi et al.'s extensive study. I rather feel their categorization is highly arbitrary(good enough for one more blog entry :-)).

But still your calculation can be close to reality considering the higher weightage to middle and lower castes in the study.

Manju Edangam said...

Haha I think I know what you mean on Mr.Dieneks famous blog right? :)

:-).

That prompted me to write this entry but that is not first instance.