Thursday, June 19, 2014

Is Genetic Distance an Adequate Predictor of Relationships?


While obviously having a small pool of potential Y-DNA participants, low frequency surnames may have the advantage of having good documentation of ancestry. That is the case with my surname and its corresponding Y-DNA Project. The original intent of our project was to see if three families bearing our surname from the original East Riding of Yorkshire had a common ancestry or if the surname was applied to these lineages independently from each other.

The three families are as follows:
SHERBURN: The largest family hails originally from Sherburn in Hartford Lythe and its many members descend from Peter Owston who died in 1568. This group also includes the Ouston descendants of James Ouston (1711-1785) who was born in Brompton by Sawdon and who died in Sigglesthorne. You’ll find members of this clan in the UK, Australia, USA, Canada, New Zealand, Netherlands, France, and the UAE.
GANTON: The second largest group of Owstons originated in the village of Ganton and can be satisfactorily traced to Giles Owston who died in 1641. While an older connection cannot be firmly established, it is probably descended from John Owston who was alive circa 1490 in nearby Staxton in Willerby. This supposition occurs because several unique first names exist in both lineages. While very few Ganton Owstons live in the UK, the majority of Owstons in the US are from this family. All surviving Ganton Owstons descend from Thomas Owston (1755-1823). Ganton, like Sherburn, is now located in North Yorkshire.
THORNHOLME: Finally, a third group of Owstons from 15 miles south of Sherburn and Ganton can be traced to Richard Owston of the village of Thornholme in the parish of Burton Agnes. Richard Owston died in 1739. By using onomastic evidence, it is possible to theorize a connection to an earlier Ganton line fathered by Robert Owston who was born as recent as 1580. The Thornholme Owstons constitute the largest group in Canada. Others descended from this lineage live in Australia, the UK, Finland, and New Zealand.
In the study’s first year, a positive conclusion was reached; as three participants (one from each family) matched each other at 100% using 43 Y-DNA marker tests from GeneTree. Others in the study matched at a genetic distance of 2 and 3.

This was exciting news as it was impossible to determine a relationship between these lineages as the connection between these three families apparently occurred before the introduction of English parish registers in 1538 (however, many of the nearby parishes do not have extant registers until much later; Burton Agnes' earliest register is 1700). The first record of the surname in the region (spelled as Oustyn) appeared in a 1452 will from the parish of Wintringham.

In order to better understand our relationships and to construct a more conclusive modal haplotype of the Owston families, it was necessary to branch out beyond our original participants and attempt to test as many Owston/Ouston males as possible. We have identified 23 lines from the three families. Some lines can further be subdivided into groups that we call segments. There are 39 lines and segments.

Currently, the Owston/Ouston Y-DNA project has 26 participants – 17 Sherburn family members, 4 Ganton Owstons, and 5 Thornholme participants. The participants represent at least one person from 20 of the 23 lines and 22 of the 39 lines and segments. Additionally, some lines/segments have more than one participant. We also intend to move those who matched the Owston modal I1 haplotype at the now defunct GeneTree to retest at 37 markers at FTDNA. Eventually, all matching individuals will be moved to 111 markers. Currently, eight former GeneTree customers will need to be retested.

Of the 26 participants, four individuals are awaiting test results. Eight of the remaining individuals failed to match the modal haplotype and are apparently the results of non parental events (NPEs). Several of these participants descended from families where known NPEs existed, while others’ results were a complete surprise. It is to be noted that everyone who tested had a clear genealogical line to one of the three original families.


Currently, 14 of the participants have a solid match to the modal haplotype. These represent two Thornholme participants, three Ganton participants, and nine Sherburn participants. Early in this study, we noticed that individuals were more inclined to have closer genetic matches with individuals who were genealogically more distant than those who were more closely related. This was a curiosity that led to the eventual writing of this post.

Over the years, genetic genealogists have tended to rely upon genetic distance to help predict a range of possible relationships. In fact, FTDNA qualifies matches at various levels of genetic distance.

For example, FTDNA states that a GD=0 at 37 markers indicates that the two individuals are “very tightly related”; and with a confidence level of .05 or less, these individuals are related within eight generations (seventh cousins). A mismatch of one GD is considered “tightly related.” Genetic distances at 2 or 3 marker differences between men of the same surname are identified as “related.” As GD increases, the likelihood of a relationship diminishes with a GD=6 as considered as being not related, even when the same surname is present (Canada, 2011).

In addition, most of us have genetically close matches with individuals who obviously were further back on the relationship continuum and do not share a common surname or a variant surname. At 37 markers, I have a number of matches with individuals whose ancestry derives from distances far removed from my own East Riding ancestors. While we are related, it is obviously far beyond the genealogical time frame and may be prior to the various invasions of Britain – one of which brought my ancestors from mainland Europe.

Using my project as a case study, I have hypothesized that, although a predictor of a familial connection, genetic distance is an inadequate predictor of relationships. Before I discuss my results, I must present some caveats.

First of all, I cannot affirm an exact connection between the three families in my study; however, I have constructed plausible trees based on shared forenames, typical naming conventions, names found in wills and other local records, and the close geographical distances among all three current families and two earlier extinct families. Currently, we can only affirm the relationship intra-family; however, based on the aforementioned factors, we are confident that the supposed relationships are close to the unknown actual relationships.

Secondly, not all 14 matching participants tested at the same level of resolution. Eight tested at GeneTree with 43 markers. Four individuals tested at FTDNA at 37 markers. Finally, two individuals tested at both GeneTree (43) and FTDNA (minimum of 37).

While GeneTree’s 43 marker test shares 32 markers with FTDNA’s 37 resolution test, it was decided to compare apples to apples and oranges to oranges. Eliminating the five makers that GeneTree did not test would disregard marker differences from five of the six FTNDA participants. Eliminating the 11 additional markers tested at GeneTree would eliminate marker differences in two of the ten GeneTree participants.

Therefore, instead of comparing all 14 matching individuals, the ten who tested at GeneTree were compared to each other and six FTDNA participants were compared to each other. While this is not a perfect scenario, it does allow for a comparison of 59 (supposed) relationships. Within the entire study, there are a total 325 relationships among all participants. These include those whose results have not yet been returned and the eight additional participants with NPEs.

Third, since we have not secured many participants with close relationships (fifth cousins and closer) to other participants, the actual (and supposed) relationships skew more distant. Currently, we have only five of the 59 compared relationships at the fifth cousin level or closer. The relationships are as follows:

 2nd Cousins1
 2nd Cousins, Once Removed1
 4th Cousins, Once Removed1
 5th Cousins1
 5th Cousins, Once Removed1
 6th Cousins, Once Removed1
 7th Cousins, Once Removed1
 8th Cousins3
 8th Cousins, Once Removed1
 9th Cousins4
 9th Cousins, Once Removed3
 9th Cousins, Thrice Removed1
10th Cousins, Twice Removed8
11th Cousins, Once Removed1
12th Cousins, Once Removed4
12th Cousins, Twice Removed2
12th Cousins, Thrice Removed3
13th Cousins2
13th Cousins, Once Removed6
13th Cousins, Twice Removed9
14th Cousins, Once Removed2
15th Cousins3

The genetic distance of the 59 relationships range from 0 to 6. As expected, when the number of participants for each genetic distance (GD) is compared, the plot almost resembles a normalized curve. Both genetic distances of 2 and 3 have 14 relationships each and dominate the center of the chart. At least in our Y-DNA project, the average genetic distance among participants appears in the neighborhood of a difference of 2 to 3 markers.

However, when we look at the relationship ranges with each genetic distance, the results are all over the road. The results for two individuals having a GD of 0 and two individuals having a GD 5 are indistinguishable. There is no rhyme or reason for the results. Randomness abounds.

Additionally, the relationships can be quantified with degrees of relationship. Degrees of relationship (DR) are calculated by totaling the number of generational steps to a common ancestor by both parties and adding the two numbers together. For example, two second cousins each have three generational steps to their common ancestor; added together, two second cousins have a DR=6.

An easy way to calculate DRs would be to take the number of consanguinity, double it, and add two. For example, sixth cousins have a DR=14. For each step removed from the common generation, add a one. Sixth cousins, once removed have a DR=15 and sixth cousins, twice removed have a DR=16. A pair of seventh cousins also have a DR=16.

By quantifying the degrees of relationship, the results indicate that there is no significant difference in relationship between our participants who have a GD=0 and a GD=5. We’ve eliminated the GD=6, as only one relationship (tenth cousins, twice removed) is represented.

073024.5411th Cousins, Once Removed
163125.5012th Cousins
2123023.6411th Cousins
3183125.5712th Cousins
4113023.1110th Cousins, Once Removed
5172924.5011th Cousins, Once Removed

Notice that the results across the board are not significantly different. A GD=0 and a GD=5 result in the same adjusted mean relationship: 11th cousins, once removed. According to these results, genetic distance is an insufficient predictor of relationship range.

Although we have reached this conclusion, this is just one study and the results may only be indicative of this particular surname. I would be curious in knowing if others can replicate similar results in their studies.

Secondly, the greatest limitation on this study is the lack of the five additional markers offered by FTDNA. Four of these, DYS576, DYS570, CDY a, and CDY b are more likely to show differences. It is expected that when these markers are added, genetic distance will increase for several of the GeneTree participants. Two FTDNA participants had differences with CDY. Three of the FTNDA participants registered one marker differences from the modal result of 19 on DYS576.

What is interesting is that two ninth cousins, once removed had an exact match at 37 markers. The common ancestor for these two individuals was born in 1598. Each carried a mutation on DYS576 with 20 repeats. While on the surface it would seem that these two participants were “tightly related,” the mutations, however, were independent of each other and were not shared by closer relatives to either party: a second cousin of one of the participants and the fourth cousin, once removed of the other.

In addition, the second cousins had a GD=1 and the fourth cousins, once removed had an unusual GD=4. By FTDNA’s explanation, the ninth cousins, once removed appeared to be “very tightly related”; however, the fourth cousins, once removed would only be “probably related.”

While we may never know what causes the frequency of mutations on Short Tandem Repeats, the examination of studies where ancestries are documented may help us to better understand the role that genetic distance plays and be able to better understand relationship prediction.


Canada, R.A. (2011). If two men share a surname, how should the genetic distance at 37 Y-Chromosome STR markers be interpreted? Family Tree DNA.