Tuesday, December 20, 2016

The Halves & The Halve Nots


The “halves” and “halve nots" – didn’t you mean “haves” and “have nots?” No, I meant what I said and here’s why. While it is generally accepted that the amount of shared autosomal DNA roughly halves with each generation, is this conclusive when we are discussing relationships at a variety of levels? In looking at my own family, I wanted to see if there were any discernible patterns in the amount of DNA shared with a relative when compared to two generations of a family, viz. a parent and a child.

METHOD

To do this, I analyzed 630 relationships from my family that included the amount of shared centimorgans of autosomal DNA. This required looking at shared DNA between two parties and the child of one of the parties. Only autosomes were used in the calculations and the X chromosome was ignored. The age span of the participants ranged to nearly 98 years with the oldest participant having been born in 1918, while the youngest was born in 2016. Two of the participants are deceased. There were 20 parent/child pairs:

  • Seven mother/son pairs.
  • Six father/daughter pairs.
  • Five father/son pairs.
  • Two mother/daughter pairs.

The results were compiled from a variety of relationships that included 33 participants in total. The relationships spanned parent/child to fourth cousins, twice removed. Tests were primarily from 23andMe and FTDNA with one at Ancestry. To be consistent, the data for matching shares in centimorgans were only gathered through GEDMatch.com. In addition, relationships that included fully identical segments were omitted (affecting only 8 full sibling relationships).

Additional relationships (several) where there was no matching DNA to a parent in the study were ignored. A number of relationships found only on 23andMe and Ancestry, although close, were not included, as they did not have GEDMatch accounts.   

All 630 relationships in this analysis were confirmed by other evidence and no speculative connections were included. The relationships were grouped according to degrees of DNA sharing. Not all possible relationships were present and only those in the study are listed below:

  • Degree 1: Parent and Child.
  • Degree 2: Half sibling, Grandparent, Grandchild, Aunt/Uncle, and Niece/Nephew.
  • Degree 3: Half Aunt/Uncle, Half Niece/Nephew, First Cousin, Great Grandparent, Great Grandchild, Great Aunt/Uncle, and Great Niece/Nephew.
  • Degree 4: First Cousin, Once Removed and Half Cousin.
  • Degree 5: Half Cousin, Once Removed; Second Cousin; and First Cousin, Twice Removed.
  • Degree 6: Half Cousin, Twice Removed and Second Cousin, Once Removed.
  • Degree 7: Second Cousin, Twice Removed and Third Cousin.
  • Degree 8: Third Cousin, Once Removed.
  • Degree 9: Third Cousin, Twice Removed and Fourth Cousin.
  • Degree 10: Third Cousin, Thrice Removed and Fourth Cousin, Once Removed.
  • Degree 11: Fourth Cousin, Twice Removed.

The goal was to analyze the percentage of DNA passed from parent to child. In addition, the child’s match with the relative was compared with the segments shared with the parent in question. In one situation, a child had matching DNA with a fourth cousin, once removed that was transmitted from his mother and not his father – the parent with the confirmed fourth cousin relationship. The relationship with the mother is unknown. This data was not included.

We also had thirty comparisons where there were two shared recent ancestral connections. The nearest relationship was that of second cousins who were also second cousins, once removed. These results were listed under the closest degree level. The relatives of those having fully identical segments died prior to advent of autosomal DNA testing – only half identical segments were present.

RESULTS

The degrees of sharing and their statistical data are included the following table:

Parent/ChildPairsMeanMedianStd Dev
Degrees 1/21650.90%51.79%5.82
Degrees 2/36548.38%48.59%6.78
Degrees 3/46349.81%49.72%8.97
Degrees 4/54049.65%46.86%11.89
Degrees 5/63648.20%50.45%11.51
Degrees 6/72050.39%52.26%22.37
Sub Total of Above24049.28%48.69%10.88
Degrees 7/81235.96%32.57%28.13
Degrees 8/92251.28%59.59%32.46
Degrees 9/103635.77%0.00%41.84
Degrees 10/11560.00%100.00%54.77
Total of All31547.53%48.48%21.18

Initially, I only looked at 480 relationships where all parent and child relationships (Degrees 1/2 to Degrees 6/7) exhibited shared DNA with the relatives in question. This produced 240 data points. For Degrees 1/2 to Degrees 6/7, 77% of the results fell within one standard deviation. A typical bell curve would have 68.2% of the results within ±1 σ.


Removing the outliers with the interquartile range, the mid results of the original 240 pairs skewed to the left of the mean as demonstrated in the chart below.


An additional 150 relationships, representing Degrees 7/8 through Degrees 10/11, were added. The only caveat for inclusion was that the parent had to match the relative in question – but the child did not need to have matching DNA to the parent’s matching relative. Of the 75 parent/child pairs that were included, 28 children failed to match the relative in question at levels of 5cM or higher. These 0.00% shares were included in the overall results.

The children’s non-matching data were so pronounced in Degrees 9/10 that the median score was 0.00%. Only 47.22% of the children at this degree level shared DNA with the said relative. The parents were either third cousins, twice removed or fourth cousins and the children were either third cousins, thrice removed or fourth cousins, once removed.

At the Degree 10/11 level, the children either matched the parent’s share at 100% or not at all – indicating an all or nothing proposition as we moved to more distant relationships. Unfortunately, only five pairs were included – which is too small to make a critical analysis.

As we moved further away from a Degree 2 relationship on the part of the child, the standard deviations increased. In other words, as the relationships grew further distant, there was a larger corresponding spread of the results. With the greater the relationship distance, the results were more heterogeneous. In most cases, the SD increased with each generational degree. The only exception was at Degrees 5/6. With a SD of 11.51, it was slightly narrower than Degrees 4/5 at 11.89.

 
With this said, many of the degrees of DNA sharing exhibited means very close to 50%. The only variations were found in Degrees 7/8 at 35.96%, Degrees 9/10 at 35.77%, and Degrees 10/11 at 60% (3 of the 5 were at 100% and 2 were at 0% shared). Both Degrees 9/10 and 10/11 had examples of all or none of the relational DNA passed from parent to child.

CONCLUSION

The conclusions are not beyond what we’ve already known about the percentage of shared DNA passed from parent to child. Up through Degrees 6/7, the shared DNA is generally within one standard deviation from the means, which are approximately 50% of the share of the parent. As these relationships become further distant, the spread of one standard deviation increases in size.

As we enter the realm of Degrees 7/8 and further distant relationships, we begin to see the phenomenon of none of the parent’s shared DNA with a relative being represented in the child’s results. With Degrees 9/10, many (but not all) of the results exhibited 0% or 100% shared DNA. At Degrees 10/11, it was either all or none proposition. It is to be noted at this level, the shared segments were between 5cM and 10cM. Since we have three generations that can be tracked lineally with these specific relationships, these segments are identical by descent (IBD), as they can be traced back to the grandparent’s much larger segment at the same position.

The rule of thumb is as follows: the closer the relationship, we are generally “the halves” – at least within one standard deviation of the half share. As for more distant relationships, it is likely we will be “halves not” – perhaps, all or nothing.

LIMITATIONS

While 630 relationships may appear to be a large number, a desired number of at least 768 (384 pairs) would provide the minimum necessary sample size with a confidence level of 95% with a 5% margin of error. As with all statistical measures, a larger sample influences a greater confidence level and a diminished margin of error. A sample size exceeding 384 parent/child pairs would be greatly desired.

A second limitation is that this study is largely represented (but not totally) by the descendants of one ancestral couple. The results include those of the ancestral mother who had tested prior to her death in 2016 and includes three generations of her progeny.  Only one of her descendants failed to participate.  In all cases, the participants (including relatives not descended from this couple) have ancestries from Northern and Western Europe. A more diverse population might provide different results.

2 comments:

  1. Followed your comment on Blaine Bettenger's post. :-) Your Academic Training is showing! :-)

    Seriously, though, I need to see if I can pull something like this out of my data. The kinds of "statistical games" are fun. (Just please don't ask me to do an Anova on them! :-)

    ReplyDelete
    Replies
    1. Susan: A good stats program, such as SPSS, will do the work for you. You can do a great deal with Excel as well. Thanks for you comments.

      Delete