There are two main ways that people are genetically different from each other. Either they vary because they have a variation in the sequence of bases along their chromosomes. Or they differ because they have a different number of copies of a gene. It is the former situation we’ll discuss here, where the SNPs vary. A SNP, pronounced snip, is shorthand for single nulceotide polymorphism. A nucleotide is a genetic base — Adenine (A), Guanine (G), Cytosine (C) or Thymine (T) — attached to a sugar-phosphate molecule. Together the sugar and base form the backbone of a DNA molecule.
These nucleotides are the individual units that, when assembled together, form the polymeric molecule that is DNA. Polymorphism literally means having more than one form. So taken together, a SNP is a single location along your DNA genome that has more than one form, or base, that could be found in that position.
Since we know that there are only four bases possible, there are only four different variations possible at any one position along your genome. However, most SNP locations have only two possible bases (of the four that exist) found there. Scientists identified a few rare locations in our DNA that have three possible forms but it’s generally two. Each form that exists at a location is called an allele. So if there are three forms possible at one SNP location, then there are three alleles for that location. For instance, you might have not only a C or G at a particular location but also an A – but that is rare.
There is one more requirement for a location with more than one allele to be considered a SNP. The least frequently seen form must also exist in at least one percent of the population. So if 99.9% of the population has a G at a particular location on one of your 23 chromosomes, that location is not considered a SNP even if you, personally, have a C allele there. This is to eliminate looking for allele changes that are so rare we might never find them.
Sequencing the Genome
There are over three billion possible locations within your genetic code where there could be potential alternate forms. This number only includes one molecule of your DNA for each chromosome (there are two molecules per strand of DNA) and only one chromosome of the pair that you have, one from your mother and one from your father. But of that three billion, only 10 million SNPs are estimated to exist. Not all SNPs have been mapped yet. Scientists need to run the complete sequences of more than one percent of the six billion human population, or more than 60 million genomes, to feel confident they’ve found them all.
As of this writing in late 2011, no single human’s genome has been completely sequenced and only 26 thousand have been nearly completely sequenced. There are technical issues that have prevented complete and proper assembly of all the data into any one human’s complete sequence. [Editor’s note: I’m not sure where it stands as of 2016]
Below is an example of a DNA sequencing output. Note that each of the peaks represents a location and the color represents the nucleotide that is found in that location. This is a Sanger sequencing display and there are more advanced methods in use by some companies and researchers.
DNA is the codebook for how each life form grows and operates. All differences in DNA from one person to another arise from a mutation. If a mutation has occurred at a particular location, then you have more than one possible allele –A,B,C or D — there. A SNP is the DNA at a location with multiple allele possibilities. Both alleles must occur more than 1% of the time within a population. This means that researchers won’t find either allele is more than 99% of the time. Here, we usually mean a spicies when we say a population.
Image credit: DNA Sequencing Equipment