All companies that offer DNA testing for genealogical purposes also provide their customers with their DNA raw data files. Thus, depending on your interests and experience, this data can also be used for other things, such as learning a little more about your genetic predispositions. In addition to various third-party services that can analyze your data for a fee, with raw data you have the possibility to browse it all by yourself.
Roughly 700.000 SNPs are listed in your raw data file, all of them representing genetic variants of the four nucleobases (A = adenine, C = cytosine, G = guanine and T = thymine) at a specific location in your DNA, which can be used to distinguish one person from another. SNPs as well as their number can vary slightly from company to company, but since there is still so much overlap, it has only little effect on the comparison between genetic relatives who tested at different companies (with the exception of very distant cousins).
This is how the first page of my 23andme raw data file looks like:
An ID in form of “rs+number” is assigned to each SNP, but some SNPs also receive an internal ID at 23andme that starts with an “i“. Next to the ID number you will find the chromosome and the location of your specific SNP. Everything is rounded up by the two alleles you have inherited from your parents for this SNP. (You can find out which allele came from which parent, if you compare your raw data with the files of your parents. Occasionally, you can find that out indirectly through other relatives, as you will see in my example later.) Sometimes instead of an “A”, “T”, “C” or “G” you will find a “D” and an “I” as alleles – “D” stands for a deletion at that location and “I” for the normal variant. (Printing out the raw data file wouldn’t be a good idea though. The coronavirus pandemic would probably be long over, before your printer spits out the last page ?)
Certain SNPs are associated with certain properties. There are currently 110877 entries for different SNPs at SNPedia that provide explanations and also include references to the relevant studies.
One of the most popular and surely one of the best researched SNPs is rs333 which can tell you, whether you might be resistant to the HIV type 1 virus or not. The resistance is due to a mutation in the form of a deletion of 32 base pairs at the position 46414947 on chromosome 3. The HIV type 1 virus needs the CCR5 receptor to enter a cell. However, because of the CCR5 Delta32 mutation, this receptor is altered in such a way that the virus is no longer able to enter the cell, ultimately leading to a resistance to this particular virus1.
About 1% of the world’s population (most notably in the northern European countries2) have inherited the CCR5 Delta32 mutation on both chromosome copies and about 10% received it from only one parent. The latter may contract the virus, but are said to have a partial resistance, manifesting itself in a much lower viral load as well as a significantly slower course of the infection1. The mutation is said to have resulted from a single mutation event2 in a person who lived about 700-20003 years ago, meaning that all people who have inherited the CCR5 Delta32 mutation descend from that first person. Since HIV/AIDS is a fairly recent disease, it is believed that this mutation must already have presented a selective advantage with regard to a different virus, smallpox4 being the most likely candidate. One of the theories suggests that Vikings5 were the ones who spread the CCR5 Delta32 mutation in Europe and Russia.
I can’t tell you how surprised I was to find such a rare mutation in someone from my family!
Since we know the position for rs333 on chromosome 3, we can search this SNP in the raw data file and look for a “D”. If you tested at 23andme, then you can go directly to the “Browse raw data” section. However, this SNP is known as “i3003626” there and the deletion is represented by a minus not a “D”, while the normal variant lists all 32 bases.
One of my paternal aunts turned out to have this mutation on one of her chromosome copies. In the following video you will see my profile with two normal variants first and then my aunt’s profile with the deletion and one normal variant:
(If the video doesn’t play, please try again by clicking the link one more time.)
My father and three of his siblings have taken a DNA test, but only aunt G. turned out to have this mutation. Since her other siblings didn’t inherit it, my paternal grandfather or my paternal grandmother must have also had the deletion on only one copy of their chromosomes. One day I hope to test the remaining two brothers as well and see how their results will look like for this SNP.
At first, I wasn’t able to tell which of my paternal grandparents had passed the CCR5 Delta32 mutation down to my aunt. But I got really curious about it and began to wonder, whether comparing DNA with other relatives might provide me with new clues. Eventually, I was able to find out that my grandfather Alexander Strelnikov must have been the carrier of this mutation and not my paternal grandmother.
Here is the one-to-one comparison of all four siblings in the Excel spreadsheet for visual phasing. I’ve marked the position of the CCR5 Delta 32 mutation at approximately 46 Mb (46414947).
Aunt G. and aunt L. share a half identical DNA segment at the location around 46 Mb. Since aunt L. doesn’t have the deletion, the grandparent who provided this segment wasn’t the carrier of the mutation. Let’s mark this grandparent with light brown and assign different colors to the other grandparents:
Aunt G. and uncle P. also share a half identical segment. Given that uncle P. shares a fully identical segment with my other aunt at this location, it means that all three of them must share the same light brown segment.
And since aunt G. doesn’t share any DNA with my father at this position, both of his colors will be different from hers.
In the next step all Strelnikov siblings will be compared to V.K., a paternal first cousin. (The comparison with A.S., another paternal first cousin from a different line, showed a similar result.)
V.K. shares a longer segment between 30 Mb to 107 Mb with everyone, except aunt G. So everyone except her shares an identical DNA segment around 46 Mb that was inherited from one of their paternal grandparents, who passed it down to the three Strelnikov siblings and V.K. It’s the segment that was previously marked as dark green. This means that the half identical DNA segment marked as light brown that aunt G. shares with aunt L. and uncle P. must be on her maternal copy of the chromosome. And that in turn means that the DNA segment with the CCR5 Delta 32 mutation must be on her paternal copy of the chromosome!
(Remember, the colors represent only the area around 46Mb and not the entire chromosome 3.)
I can’t tell whether my paternal grandfather received this mutation from his father or mother, or whether one of my great grandparents had the mutation on one or both chromosome copies. Nevertheless, I will keep this in mind, because my great grandmother Domna Pimshina, my paternal grandfather’s mother, was of indigenous ancestry and some studies suggest that the CCR5 delta32 mutation is particularly common among the Mordvins2 and Pomors6, two indigenous populations living in Russia (the ethnic status of the latter is still debated though). I don’t know which indigenous community my great grandmother descended from. Could this mutation be the first clue?
Other genetic genealogists have also written about this mutation:
Sources used for the general information on the CCR5 delta32 mutation: