Visual Phasing of Chromosome 1 – updated version using Steven Fox’s Excel spreadsheet

If you have the possibility to test three full siblings, then the next great thing you could do with your DNA results is to try out the Visual Phasing technique developed by Kathy Johnston. It allows you to map the segments of your chromosomes to your four grandparents (without having them or your parents tested) by comparing the recombination points of the three siblings. However, to figure out which of your grandparents a segment belongs to, you will still need additional cousin testing on several of your lines. Ideally multiple 2nd cousins on either side of your family, but more distant cousins have proved to be very valuable as well.

I do not have full siblings, but luckily both my mother and father do. I have tested my mother and my maternal uncle a few years ago and when I first read about Visual Phasing on Blaine Bettinger’s fantastic blog, I immediately knew what I was going to do next – get a DNA test for my maternal aunt, too!

In my earlier version of the post I was explaining how to do Visual Phasing manually, but I thought it was now time for an update, because I can no longer imagine doing Visual Phasing without Steven Fox’s marvellous automated Excel spreadsheet. It speeds up the process a lot! Therefore, after uploading autosomal raw data files of the three siblings to GEDmatch, you would need to join the Visual Phasing working group on Facebook – currently the only place, where Steven Fox’s Excel spreadsheet is available for download. However, you will only be able to use the automated spreadsheet, if you are already familiar with how the Visual Phasing method works and have a good understanding of genetic genealogy in general. So before we get started, let’s have a look at some basic information about genetic genealogy. (If you are already familiar with the basic concepts, please scroll down to the next part.)

PART I – THE BASICS

1.1. Chromosome inheritance

People normally have two copies of each chromosome (one received from their mother and one from their father) and a total of 23 chromosome pairs. The first 22 are called autosomal chromosomes and the 23rd pair is referred to as the sex chromosomes. (Girls are imagined as XX – they receive one X-chromosome from their mother and one from their father. Boys are imagined as XY – they receive their X-chromosome from their mother and the Y-chromosome from their father.) During conception the autosomal chromosomes undergo a recombination creating a new set of chromosome pairs for the child. The position where a recombination has taken place is referred to as crossing-over or simply recombination point.

Image 1. The inheritance of the autosomal chromosomes. (The grandparents also have two chromosome copies, but are presented with only one for simplicity reasons.)

With regards to the sex chromosomes everything is a little different. Girls receive one recombinated copy of the X-chromosome from their mother and one unchanged copy from their father, consisting only of their paternal grandmother’s X-DNA. Boys receive an unchanged Y-chromosome from their father and a recombinated X-chromosome from their mother’s two copies, which they then pass on unchanged to their daughters.

Image 2. The inheritance of the sex chromosomes

1.2. GEDmatch One-to-one comparison tool

GEDmatch is the best third party website when it comes to working with DNA results and the place to go to and upload your raw data files once you receive them. It offers various great tools and one of them is the One-to-one compare tool, which we need for Visual Phasing.

Image 3. GEDmatch One-to-one comparison tool

When we upload raw data files to Gedmatch and want to use it for sibling comparison via the One-to-one tool, chromosome copies are presented differently. You will see two bars: the upper one with either red, yellow or green segments and a lower blue one. The lower blue bar defines the size of the segments – if you don’t change the default settings, only segments longer than 7cM (centiMorgans) will be shown. The upper bar is essential for the Visual Phasing technique and indicates whether there is a full match, a half match or no match between the two compared people.

If we compare the siblings 1 and 2 from Image 1, the upper GEDmatch bar will look like this:

Image 4. Upper GEDmatch bar explanation

  • red means that on both copies of the chromosome DNA was received from two different grandparents
  • yellow means that on one copy of the chromosome, either maternal or paternal, DNA was received from the same grandparent
  • green means that on both copies of the chromosome DNA was received from the same two grandparents

The position where a change from red to yellow or yellow to green occurs is called crossing-over or simply recombination point. Please note that green segments can only change to yellow, but never directly to red and vice versa – if they do, then you have likely missed a recombination point.

The comparison of the sex chromosomes works the same way. However, a brother normally won’t get a full match (green) with his sister on the X-chromosome.

1.3. Assigning recombination points to a specific sibling

We know that each child receives recombinated sets of chromosome pairs  from their parents. Now we need to find out the positions on the chromosomes where those recombinations have taken place and then assign them to one of the three siblings of our Visual Phasing project.

Image 5.1 Haas siblings comparison on chromosome 1 via the One-to-one tool

Now we need to mark all recombination points.

Image 5.2 Identifying recombination points

Our next step is to assign the recombination positions to a specific sibling.

Image 5.3 Aunt’s recombination positions on chromosome 1

By comparing my aunt vs. my uncle and then my aunt vs. my mother we see that a recombination has taken place at the same position in the beginning. Since my aunt is involved in both comparisons, this crossing-over will be assigned to her. Recombination points are assigned to the person, who is involved in two comparisons at the same position. However, it isn’t always this easy – now and then it can get really tricky, when three people seem to “own” a recombination point.

Image 5.4 Mom’s recombination positions on chromosome 1

 

Image 5.5 Uncle’s recombination positions on chromosome 1

Please remember that we are always talking about two chromosome copies – one maternal and one paternal. So the recombinations shown here have taken place on both copies. For example, out of my uncle’s 4 crossing-overs, 3 could have taken place on the maternal copy and one on the paternal or two crossing-overs on each copy and so on.

Now that we have the basic backround information needed for Visual Phasing, we can finally turn to Steven Fox’s automated Excel spreadsheet.

 

PART II – STEVEN FOX’S AUTOMATED EXCEL SPREADSHEET

I have mentioned already that the Visual Phasing working group on Facebook is currently the only place where Steven Fox’s automated Excel spreadsheet is available for download. Besides, the group has many Visual Phasing experts among its members, who you could ask for advice in case you feel stuck at one of the steps of your Visual Phasing project.

2.1 Setting up

Image 6.1 Main tab

This is how the Main tab looks like when you open the spreadsheet after download (don’t forget to click on the Enable content or Enable Macros button).

It consists of three tables:

  • the Grandparent table
  • the Sibling table
  • the Cousin table 

two buttons:

  • Build (starts the process of extracting data from GEDmatch)
  • Delete (deletes all data at once)

and two drop down menus:

  • Tasks (various tasks will be explained during the process)
  • Open (opens the chromosome you want to work with)

We will start with entering the last names of our four grandparents into the Grandparent table. Next, we enter the data for the three siblings of our Visual Phasing project into the Sibling table- their initials, short and full names as well as the corresponding GEDmatch kit numbers and gender. At last, we enter the data for the cousins we want to use for Visual Phasing – 1st cousins will help telling the maternal and paternal chromosome copies apart and 2nd cousins will help identifying the grandparent in question. You could also use other more distant cousins from GEDmatch here – now and then they turn out to be the last piece of the puzzle!

Image 6.2 Filling out the Main tab

After you have filled out the Main worksheet press Build in the upper left corner. A pop-up tells you that 23 tabs for the chromosomes will be created now. Then another pop-up appears and you will be asked to enter your GEDmatch login credentials. You are also given the option to alter SNP and cM threshholds.

Image 7. Entering GEDmatch login credentials

As soon as you have done that, the program will start extracting all sibling and cousin data from GEDmatch (unless you have made a mistake entering GEDmatch kit numbers).

Image 8. Completion message

In less than 20 minutes another pop-up informes me that the processing was completed and 552 images were created. Wow! Just imagine how long it would have taken me to copy 552 images manually!

Now I can click on the chromosome 1 tab or select it from the Open drop down menu. The chromosome tab shows additional drop down menus, which will be explained in the process.

Image 9. Chromosome 1 tab

I’ve chosen chromosome 1 for my example, because it turned out to have the cousin matches that I needed to complete the mapping. However, chromosome 1 is one of the two largest chromosomes and is usually not recommended for beginners. If you are trying out Visual Phasing for the first time, chromosomes 20 or 21 would be a better choice.

2.2 Recombination points

In Part I – The Basics I have already marked all recombination points on my mother’s and her siblings chromosome 1 copies. All I have to do now is to drag the column boundaries to each previously identified recombination point. At last, I have to highlight and delete the formatted columns on the right that were left over.

Image 10. Identifying recombination points

Now we need to assign all recombination points to the sibling involved and insert their initials into the second line of the spreadsheet.

Image 11. Labelling recombination points

2.3 Megabase numbers

In order to record megabase numbers for the recombination positions, we have to open the Task drop down menu and choose Megabase from the task list. However, due to the image scaling those numbers will differ from those given at GEDmatch (you have to check the full resolution box before one-to-one comparison and then scroll to the right through the expanded chromosome for the starting and ending points of fully identical segments) or David Pike’s utility Search for Shared DNA Segments in Two Raw Data Files, where after uploading two raw unzipped files you get a neatly arranged list of half and fully identical segments with their starting and ending points. So if you prefer to work with other numbers, just alter them manually.

Image 12. Megabase numbers

And now the Visual Phasing can begin.

2.4 Grandparent codes

We will start working with the generic grandparent codes G1, G2, G3 and G4. Open the Task drop down menu and select Display codes. G1 & G2 are complementary (either maternal or paternal) and so are G3 & G4. Next, go to the Extra View drop down menu next and select Segment map. It will mirrow the upper GEDmatch image once you start coding, and if it doesn’t, you will immediately know you have made mistake.

As you can see my mother and aunt share a fully identical segment between 115 and 156. Accordingly, neither one of them shares DNA with my uncle at that position. Therefore, we can already assign all four grandparent codes – at this stage not knowing which color belongs to which grandparent, of course.

Image 13. Visual Phasing/Assigning grandparent codes

2.5 Extending segments

We can now extend the segments to the next crossing-over points of each sibling. (Well, in the case of my aunt we can’t – her segment lies between her two recombination points.) Go to the Task menu and select Extend.

Image 14. The Extend task

In the blink of an eye the segments are extended! For my mother to the left until her crossing-over at 17.5 and to the right until 183. For my uncle to the left until 94 and to his recombination point on the right at 207.5.

Go to the Task menu again and select Global Options, then check the Automatically extend segments box. Whenever possible the segments will be extended automatically (magically!) from now on.

Image 15. Global options – Automatically extend segments task

Look at the area between 17.5 and 55 – my mother and aunt do not share any DNA here, so opposite colors will be assigned to my aunt. Instead of typing G2 and G4, I can just insert an asterisk here and the right colors will appear (plus the segment will be extended to 6). For the tiny region between 30 and 34 I will also use the asterisk to assign opposite colors to my uncle, since he doesn’t share DNA with my mother at this position, but a fully identical segment with my aunt instead.

Image 16. Visual Phasing

It looks like I’m stuck, but not yet. I’m still able to extend my aunt’s segment to her crossing-over at 222, but I need to change one of her codes – at this point it doesn’t matter which one as the codes aren’t assigned to a specific grandparent yet. Why? Because in the region between 156 and 183 my aunt shares a half identical segment with her siblings, but a different one with each since they do not share any DNA at that position. So I’ll just insert a ? on my aunt’s upper grandparent line and one of the colors will instantly change and her segment extended.

Image 17. Decision point

This allows me to fill out the area between 183-207.5 for my mother and then again 207.5-222 for my mother and uncle, because it’s fully identical to my aunt’s. So I type the asterisk in again. Instantly, my mother’s segment is extended to the right until 238 and my uncle’s segment to the end.

Image 18. Visual Phasing

Now I’m stuck.

Ideally you fill the chromosomes out entirely first and then turn to your matches and see whether they can help you separate your four ancestral lines. In my case, however, I need to start bringing my cousins in now.

2.6 Bringing in 1st cousins

Among the people who agreed to take a DNA test for me is E.A., my mother’s and her sibling’s maternal first cousin. E.A. and the Haas siblings share maternal grandparents – maternal grandmother Ottilia Arnhold and maternal grandfather Heinrich Antoni. Let’s see whether E.A. can help to distinguish which of the Haas chromosome copies is maternal and which is paternal.

Since I’ve included E.A. in the Cousin table earlier, her data was also extracted from GEDmatch and now I can just go to the Extra view drop down menu and select her! In a fraction of a second I am able to see where she matches all three Haas siblings! And all of it at a single glance – in addition to the images, the boxes on the left provide me with the starting and ending points of all matching segments. Steven Fox did an excellent job developing this marvellous spreadsheet!

Image 19. Bringing in E.A. – a maternal first cousin

Besides, E.A. turned out to be an excellent match! Not only are we able to distinguish between the maternal and paternal chromosome copies now, thanks to E.A. we are also able to complete all three chromosome pairs of the Haas siblings!

G1 and G2 grandparent codes are maternal, because E.A. matches both my mother and aunt from 119.5 to 156 and then continues to match only my mother until 165.5, because my aunt switches to G2 at 156.

Image 20. Distinguishing between the maternal and paternal sides

Now that we know which of the two sides is maternal and which paternal, we will use the Replace function and replace G1 and G2 grandparent codes with M1 and M2. Likewise G3 and G4 will be replaced with P1 and P2.

Image 21. Replacing grandparent codes

E.A. matches all three siblings between 6 and 14.5 and we know already that my aunt has M2 at that position. Therefore, M2 will also be assigned to my mother and uncle at 6 to 17.5 (and because of the half match with aunt P1 on their paternal copy) and then automatically extended to the left end. In addition, my uncle’s segment is extended to the right until his recombination point at 30.

Since my aunt doesn’t match her siblings on the tiny bit from the beginning to 6, she will be assigned M1 & P2.

Image 22. Visual Phasing

E.A. also matches all three Haas siblings between 67 and 94 and we already know that my mother has M1 on her maternal copy of the chromosome at that region. My uncle will be assigned M1 on his maternal copy as well and P2 on the paternal one. In addition, his segment is extended to the crossing-over on the left at 34. His chromosome pair is now completed!

We also see that my aunt and uncle share a fully identical segment between 55 and 94, so M1 and P2 will be assigned to my aunt at that area and her segment extended to her next recombination point at 106.

Image 23. Completion of my uncle’s chromosome pair

E.A. matches my aunt at 238 to 244, but she doesn’t match my uncle, who has M2 at that position. So it must be M1 for my aunt on the maternal copy and P1 on the paternal (because of half match with uncle). Her segment is extended to the left to her recombination point at 222. Opposite codes M2 & P2 can be assigned to my mother from 238 to the end now, because she doesn’t share any DNA with my aunt here. We have completed my mother’s chromosome pair, too!

Image 24. Completion of my Mom’s chromosome pair

Now only two last segments need to be filled out for my aunt. The tiny spot between 106 and 108 will be assigned M2 & P2, because this region is fully identical to my uncle’s. From 108 to 115 E.A. shares a M1 segment with my mother, but no segments with the other two Haas siblings. Hence M2 will be assigned to my aunt on her maternal and P1 on her paternal copy of the chromosome. And voilà – we have completed all three chromosome pairs now!

Image 25. All three chromosome pairs completed

2.7 Bringing in 2nd cousins

Before we continue with bringing in 2nd cousins, let’s go back to Extra view and have a look at the Segment map again to confirm we have coded everything correctly. For a better view choose Merge from the Task menu – it will merge the same cells into one making it look a lot tidier. If you prefer the paternal line to be on top instead of the maternal one choose Flip from the Task menu. So far everything looks perfect!

Image 26. Tidying up

Available 2nd cousins are great for Visual Phasing, because they will point directly at one of the grandparents. N.B. happily agreed to take a DNA test for me (she even joked that at least her saliva will travel to the U.S, while she herself never has). Her paternal grandmother Margaretha Arnhold and my mother’s and her sibling’s maternal grandmother Ottilia Arnhold were sisters. I have to go to Extra view again and choose N.B. from the cousin list.

Image 27. Bringing in N.B. – a 2nd cousin on the Arnhold line

It looks like the M1 code belongs to the Arnhold line. N.B. matches my mom and aunt on their fully identical segment from 115 to 156 and continues to match my mother to 165.5, while my aunt switches to M2. Thus, the M2 code will be assigned to Antoni, the maternal grandfather’s line. So now we can go to the Replace drop down menu and replace M1 with Arnhold and M2 with Antoni.

Image 28. Maternal grandparents’ segments identified

2.8 Bringing in distant cousins from GEDmatch

With regards to the Haas siblings’ paternal side, everything is a lot more complicated. My mother’s father grew up in an orphanage after being sent there as a toddler following his parents’ death and didn’t know anything about his biological relatives – not even the name of his mother. All he was told later was that the rest of his family emigrated to Canada and the United States at around the time of his birth. No names, no places. Who would have thought that a century later drops of saliva would be able to provide new information?

One of our family’s most interesting matches is the now deceased W. Schlegel, who agreed to test for his niece a few years ago while she was researching their Volga German ancestry. W. Schlegel’s ancestors emigrated to Canada from Pobochnoye, a Lutheran Volga German village, and I immediately connected him to my grandfather. Pobochnoye was the mother colony of my grandfather’s birthplace (my maternal grandmother was from an entirely different part of the Volga river area and her portion of the family tree is well researched). What made W. Schlegel even more special, was that he was a match on my maternal grandfather’s mothers side. Yes, my unknown great-grandmother. So how can I be so sure? It’s because W. Schlegel matched both my mother and my aunt on the X-Chromosome! The X-Chromosome my grandfather passed down to his daughters was inherited from his mother. (On W. Schlegel’s part, the X-Chromosome doesn’t come from his Schlegel side, but from his maternal Wagner side.)

Therefore, after comparing the segments W. Schlegel and the Haas siblings have in common, I can map DNA segments to their paternal grandmother (and by process of elimination to their paternal grandfather as well).

Since I’ve also put W. Schlegels GEDmatch kit number into the Cousin table earlier I can now retrieve his data from Extra view.

Image 29. Bringing in distant paternal cousin W. Schlegel

W. Schlegel matches my mother and my aunt between 144 and 180 and shares no DNA with my uncle on this chromosome. Therefore, the P1 code will be assigned to their paternal grandmother – my maternal grandfather’s unknown mother. Accordingly, the P2 code will be assigned to the Haas line. As of this moment, I have no name to replace the P1 code with, so it will be PGM for now, while the P2 code will be replaced with Haas.

We are done!

Image 30. Chromosome 1 coding completed

CONCLUSIONS

Sometimes it is not possible to complete an entire chromosome by logic only, because multiple outcomes are possible. However, bringing in other family members still allows you to move forward.

Visual Phasing can help you enormously in your research by providing valuable information about your ancestral composition, especially if you hit a brick wall due to adoption or lack of documentation. If you are adopted, but have three kids on your own, you can still use this methodology to learn which of your matches are maternal and which paternal.

This great technique allows you to prove a theory about a certain ancestral connection or dismiss it. Let’s imagine I had a cousin match with a large segment on chromosome 1 and a Haas ancestor in her tree. Naturally, I would be tempted to connect her immediately to my grandfather’s paternal side. However, after visually phasing my mother’s chromosome and learning that her paternal copy of chromosome 1 comes largely from her father’s mother’s side – thus shrinking my chances to have inherited a Haas segment on this chromosome to a minimum – I would now proceed more cautiously, considering the possibility of a different connection more open-mindedly.

Currently I’m visually phasing my father’s and his siblings’ chromosome copies, meaning that at some point in the future, I will theoretically be able to assign the segments of my own chromosomes to all my 8 great-grandparents! Knowing which segments, I inherited from which of my ancestors, will help me to arrange my matches more accurately or sort them into new groups, which in turn may one day be the key to solving the mystery of our family. Or yours.

Special thanks to Steven Fox for creating this marvellous Excel spreadsheet tool and saving all of us tons of time during the Visual Phasing process!

https://www.facebook.com/groups/visualphasing