IBD vs. IBS
When you are matched up to a potential family member at one of the genetic testing services, they do so by finding segments of DNA where the alleles are the same over a minimum distance of both your and your match’s DNA. At least that’s what is supposed to happen. There are several things that can go wrong when matching segments between people and here we’re going to discuss the unfortunate incident of the IBS cousin. No, IBS doesn’t stand for any kind of medical condition. The term IBS, Identical by State, refers to any segment that matches another person’s DNA no matter how that matching occurred. The two sub classifications are where the segment is not only IBS but also IBD, Identical by Decent, or — if the segment is ubiquitous within a specific population and you get it by random chance — a non-IBD segment. Since IBD segments are easier to understand, let’s tackle an IBD segment first. After we understand IBS, IBD and non-IBD segments, I’ll discuss mIBC segments — those misIdentified by the Computer.
A segment is IBD when two people inherited the matching alleles from a common ancestor. Now, common ancestor is a very open concept, is every allele that modern people (Homo sapiens sapiens) got from Lucy’s people (Australopithecus afarensis) IBD? It depends on what context you’re talking about it in. If you’re talking about the differences between Homo Erectus and Homo Habilis, two of our ancestors, then yes, you might consider all Homo species descended from Lucy’s people to be IBD. But we’re interested in a genealogical timeframe of 500 to possibly slightly more than 1000 years. What we want to know is, “Is Bob your uncle?” So for us, the answer would be no, every allele that descended from Lucy’s people to all modern humans is not IBD. Instead, we’ll restrict ourselves to ‘in the last 20 generations’.
If you and your sister both got the sequence CGGTATTACCTG from your father, then that segment is IBD because you got it from a common ancestor. If your father got it from his mother, then that segment is IBD with any children or grandchildren of your paternal grandmother who happened to inherit that segment or part of that segment. These would be your first cousins. Let’s say we could establish that your paternal grandmother got that segment from her mother. Now, you know that the matching segment with a predicted second cousin, the offspring of your great-grandmother on your father’s side, are IBD. And so on it goes until you are soon tracing fifth and eighth cousins back in time.
A sequence is IBS when the segment of DNA is identical at all possible allele positions. It could be that the segment is IBD and inherited from a common ancestor in a genealogical timeframe, but it also could be non-IBD but still IBS. How does that happen? Each allele position exists within a population at a certain frequency. An allele position by definition can’t be at 0 or 100%, but it can be darn close. Let’s look at a specific example. Ten percent of the population are left-handed. If you are right-handed and meet another person that is right-handed, does that mean you are related to them in a genealogical timeframe? Obviously not. The allele that controls right-handedness is ubiquitous in the population and non-IBD.
In these cases, which are normally small segments below somewhere below between 5 and 7 cM, someone could have the same DNA sequence as you, come back as a match on 23andMe and still not be closely related to you. Even if you knew for certain every ancestor on both sides all the way back to the 6th century, you might not be able to identify a common ancestor. That sequence could be very old and because it provided a very slight advantage to those that had it, it remained intact for thousands of years. But because the advantage was small enough, those that didn’t have the exact same sequence could also thrive and reproduce but at a slightly lower rate.
Another way sequences become very common is when a population goes through an evolutionary bottleneck. This is where the population gets very small for whatever reason — disease, famine, or isolation such as when small group migrates to a new land — and then expands to fill its niche again. When this happens, the assortment of alleles that are available is sharply reduced. In these cases, many people can have similar alleles that are IBS but their common ancestor would have to be traced all the way back to the bottleneck. When the bottle neck is far enough back that no (or few) genealogical records persist, those segments are now considered IBS.
All IBD segments are also IBS although some genealogy websites might prefer to say that IBS segments refer to segments that aren’t IBD because it’s simpler. But more correctly they should say, there are IBS segments between people and some of them are IBD and some are non-IBD segments. The failure to properly use the correct terminology causes issues over and over in genetic genealogy. Not just in misunderstanding your matches, but also by affecting those creating new tools for the genetic genealogy community. In addition to these two types of inherited segments, there are anomalies caused by comparing unphased DNA data; I call these shadow segments and shorten it to mIBC or mis-identified by computer. We’ll talk about them next.
If your father and your mother both have the same sequence one of their chromosome pairs and both you and your sister inherit one copy of that sequence from different parents, then that segment is IBS between you and your sister and not IBD. But there is no way to determine that happened. The more similar or inbred the population that your parents come from were, the more likely this is to happen. Testing companies strive to test for SNPs that help to differentiate between individuals as well as those linking two people together but there are bound to still be overlap if your parents are from the same population. Of course, it’s always possible that your parents are distantly related within a genealogical timeframe and that segment came to them both from their great-great-great-great-grandmother!
One giant roadblock to determining your ancestral matches with certainty is that your DNA data isn’t phased. That means, when each position is tested you get not one answer per location for most genes (excluding the X and Y in men) but two. You see, you have two of each chromosome. One from your mother and one from your father. The sequence detecting machine can only tell if there is an A,G,C or T at any position not whether it came from your mother or your father. If you open your results you’ll find that there is a position indicator on one side and two, not one, bases listed next to the position. If you’re homozygous for that allele position, you’ll have two of the same base. That means you got the same base from mom as you did from dad. But if you got two different bases from your parents, the machine can’t tell which parent you got it from and therefore can’t create a phased set of genetic blueprints to use for comparisons.
The longer the SNP segment the less likely these SNPs are to be randomly like someone else’s in the database. This is where the SNP threshold comes in. This threshold is required for unphased data to try to eliminate matches that the computer has misidentified a matching sequence where none in fact exists. Unfortunately, as with all thresholds, you eliminate some real matches while allowing some false matches to slip through. I call these imaginary matching segments “shadow segments”, existing only in the realm of possibility and refer to them as mIBC matches or misidentified by computer. Neither of your chromosomes has this sequence.
Here’s an example:
You have two chromosomes and this is the actual sequence on each chromosome.
From Mother: ACTAGAGTTAAC
From Father: AGTCCACTATAG
Your match ‘Bob’ has two chromosomes as well and this is their actual sequence:
From Mother: ACAAGAGTTAAG
From Father: AGTCGACTATTC
The computer imagines that this matching sequence could possibly exist:
Shadow Seg: ACTCGAGTTTAC
Based on this mIBC segment, the computer matches you with Bob. But you and Bob aren’t related, you have no MRCA and haven’t descended from the same people.
These mIBC segments can be differentiated from IBS segments through either phasing (an expensive method of DNA sequence determination that separates the chromosomes first) or pseudophasing (a software method I’ve been proposing FTDNA and 23andMe develop for several years) your DNA sequence. So while you can’t tell an IBD segment from a non-IBD segment, you can eliminate a mIBC segment mismatch. I’ll try to make a post soon saying how that can be accomplished by 23andMe or FTDNA today. Some other 3rd party programmers are also trying to create programs to do this but to accomplish it, you need to have the raw DNA from several other family members. If they succeed it will be a great start but really, it requires 23andMe or FTDNA to do it for everyone’s benefit.
Graphics Credit: K S Rose