Using 23andMe’s new Composition Tool creates some challenges and this posting is an attempt to explain the issues brought up in some of their discussion posts. This is still the best Ethnicity tool out there. I applaud them for using their own customers to glean data from hard to test populations.
As can be seen from the picture above, you have 2 copies of each chromosome. This is my Ancestry Composition chromosome 6. One of this pair of chromosomes came from my mother and one from my father.. or did they? Unfortunately, it appears that the post Finch smoothing process leaves something to be desired. Finch is the phasing process which assigns various SNPs to an ethnicity. In the picture above, it has assigned segments as British/Irish (shown in blue — some B/I segments are indicated by blue arrows that I added), some as General Northern European, a small segment as Scandinavian and a tiny segment as belonging to the Southern Asia haplogroup. In theory, the top chromosome is supposed to belong to one parent and the bottom to the other – although which parent is unknown, we can comfortably call them parent A and parent B — in theory.
Unfortunately, the process has some flaws. Frequently, because the process looks at the DNA in groups, it can mix up which chromosome, 6a or 6b in our example, it assigns the identified segment to. So let’s look at that Scandinavian segment. What can we actually tell at this point from the assignment. Can we tell if parent A or parent B has Scandinavian? No. Why not? Well the software makes homologous chromosome assignment errors that means it could randomly be assigned to chromosome 6a or 6b. What we can say, is that I have ancestry from the B/I haplogroup from either parent. I know that, because on the same section of both chromosomes I have B/I haplotypes. We can also tell that the parent that has South Asian (SA) is not the same parent that has Scandinavian, because once again, the Scandinavian segment is sitting in the same location as the SA segment, right on top of it.
In an attempt to communicate more clearly, Davidski brought up that we were using the same term for more than one error type. Below is the clarification of the types of errors that we’re seeing in the analysis.
Switch error = when alleles (ie polymorphic base pair) are assigned to the wrong haplotypes such that an Irish Allele is called Scandinavian, for instance.
Phasing error = Any error during the phasing portion of Finch including but not limited to Switch errors.
Homologous Chromosome Assignment Error (HCA Error) = where an individual has segments assigned to the wrong chromosome of a chromosome pair during post phasing manipulation? This seems to be due to a window classification error by the Finch phasing program. 23andMe calls this a switch error.
In my opinion, the Ancestry Composition Tool is of limited use as it stands now. It needs to do a much better job of assigning the haplogroup segments to the right homologous chromosome (either 6a or 6b in our example) before it can really be useful. As it stands, someone would need to go through their matches one at a time, segment by segment. and determine which homologous chromosome each segment label belongs to by looking at their DNA cousins assignments and trying to assort things that way. It would be easier for 23andMe to do than us, but it would likely take a huge amount of processor time. I think they are making a heroic effort in this area, but the tool just isn’t all the way ‘there’ yet. And, I think there are enough errors in the system that it will require some redundancy checking and eliminating of mismatched cousins. We have a certain burden of cousins who are not really related to us because the DNA matching algorthm isn’t working with phased data. Now that it’s mostly phased, 23andMe should go back through the list and only give us our real cousins. Give plenty of warning for those that are emotionally attached to their cousins (I know I am!) so they can get all their info, and then clean out the match list.
Credits: Screenshot taken from 23andMe website. No ownership should be implied.