Tuesday, December 20, 2016

The Halves & The Halve Nots

The “halves” and “halve nots" – didn’t you mean “haves” and “have nots?” No, I meant what I said and here’s why. While it is generally accepted that the amount of shared autosomal DNA roughly halves with each generation, is this conclusive when we are discussing relationships at a variety of levels? In looking at my own family, I wanted to see if there were any discernible patterns in the amount of DNA shared with a relative when compared to two generations of a family, viz. a parent and a child.


To do this, I analyzed 630 relationships from my family that included the amount of shared centimorgans of autosomal DNA. This required looking at shared DNA between two parties and the child of one of the parties. Only autosomes were used in the calculations and the X chromosome was ignored. The age span of the participants ranged to nearly 98 years with the oldest participant having been born in 1918, while the youngest was born in 2016. Two of the participants are deceased. There were 20 parent/child pairs:

  • Seven mother/son pairs.
  • Six father/daughter pairs.
  • Five father/son pairs.
  • Two mother/daughter pairs.

The results were compiled from a variety of relationships that included 33 participants in total. The relationships spanned parent/child to fourth cousins, twice removed. Tests were primarily from 23andMe and FTDNA with one at Ancestry. To be consistent, the data for matching shares in centimorgans were only gathered through GEDMatch.com. In addition, relationships that included fully identical segments were omitted (affecting only 8 full sibling relationships).

Additional relationships (several) where there was no matching DNA to a parent in the study were ignored. A number of relationships found only on 23andMe and Ancestry, although close, were not included, as they did not have GEDMatch accounts.   

All 630 relationships in this analysis were confirmed by other evidence and no speculative connections were included. The relationships were grouped according to degrees of DNA sharing. Not all possible relationships were present and only those in the study are listed below:

  • Degree 1: Parent and Child.
  • Degree 2: Half sibling, Grandparent, Grandchild, Aunt/Uncle, and Niece/Nephew.
  • Degree 3: Half Aunt/Uncle, Half Niece/Nephew, First Cousin, Great Grandparent, Great Grandchild, Great Aunt/Uncle, and Great Niece/Nephew.
  • Degree 4: First Cousin, Once Removed and Half Cousin.
  • Degree 5: Half Cousin, Once Removed; Second Cousin; and First Cousin, Twice Removed.
  • Degree 6: Half Cousin, Twice Removed and Second Cousin, Once Removed.
  • Degree 7: Second Cousin, Twice Removed and Third Cousin.
  • Degree 8: Third Cousin, Once Removed.
  • Degree 9: Third Cousin, Twice Removed and Fourth Cousin.
  • Degree 10: Third Cousin, Thrice Removed and Fourth Cousin, Once Removed.
  • Degree 11: Fourth Cousin, Twice Removed.

The goal was to analyze the percentage of DNA passed from parent to child. In addition, the child’s match with the relative was compared with the segments shared with the parent in question. In one situation, a child had matching DNA with a fourth cousin, once removed that was transmitted from his mother and not his father – the parent with the confirmed fourth cousin relationship. The relationship with the mother is unknown. This data was not included.

We also had thirty comparisons where there were two shared recent ancestral connections. The nearest relationship was that of second cousins who were also second cousins, once removed. These results were listed under the closest degree level. The relatives of those having fully identical segments died prior to advent of autosomal DNA testing – only half identical segments were present.


The degrees of sharing and their statistical data are included the following table:

Parent/ChildPairsMeanMedianStd Dev
Degrees 1/21650.90%51.79%5.82
Degrees 2/36548.38%48.59%6.78
Degrees 3/46349.81%49.72%8.97
Degrees 4/54049.65%46.86%11.89
Degrees 5/63648.20%50.45%11.51
Degrees 6/72050.39%52.26%22.37
Sub Total of Above24049.28%48.69%10.88
Degrees 7/81235.96%32.57%28.13
Degrees 8/92251.28%59.59%32.46
Degrees 9/103635.77%0.00%41.84
Degrees 10/11560.00%100.00%54.77
Total of All31547.53%48.48%21.18

Initially, I only looked at 480 relationships where all parent and child relationships (Degrees 1/2 to Degrees 6/7) exhibited shared DNA with the relatives in question. This produced 240 data points. For Degrees 1/2 to Degrees 6/7, 77% of the results fell within one standard deviation. A typical bell curve would have 68.2% of the results within ±1 σ.

Removing the outliers with the interquartile range, the mid results of the original 240 pairs skewed to the left of the mean as demonstrated in the chart below.

An additional 150 relationships, representing Degrees 7/8 through Degrees 10/11, were added. The only caveat for inclusion was that the parent had to match the relative in question – but the child did not need to have matching DNA to the parent’s matching relative. Of the 75 parent/child pairs that were included, 28 children failed to match the relative in question at levels of 5cM or higher. These 0.00% shares were included in the overall results.

The children’s non-matching data were so pronounced in Degrees 9/10 that the median score was 0.00%. Only 47.22% of the children at this degree level shared DNA with the said relative. The parents were either third cousins, twice removed or fourth cousins and the children were either third cousins, thrice removed or fourth cousins, once removed.

At the Degree 10/11 level, the children either matched the parent’s share at 100% or not at all – indicating an all or nothing proposition as we moved to more distant relationships. Unfortunately, only five pairs were included – which is too small to make a critical analysis.

As we moved further away from a Degree 2 relationship on the part of the child, the standard deviations increased. In other words, as the relationships grew further distant, there was a larger corresponding spread of the results. With the greater the relationship distance, the results were more heterogeneous. In most cases, the SD increased with each generational degree. The only exception was at Degrees 5/6. With a SD of 11.51, it was slightly narrower than Degrees 4/5 at 11.89.

With this said, many of the degrees of DNA sharing exhibited means very close to 50%. The only variations were found in Degrees 7/8 at 35.96%, Degrees 9/10 at 35.77%, and Degrees 10/11 at 60% (3 of the 5 were at 100% and 2 were at 0% shared). Both Degrees 9/10 and 10/11 had examples of all or none of the relational DNA passed from parent to child.


The conclusions are not beyond what we’ve already known about the percentage of shared DNA passed from parent to child. Up through Degrees 6/7, the shared DNA is generally within one standard deviation from the means, which are approximately 50% of the share of the parent. As these relationships become further distant, the spread of one standard deviation increases in size.

As we enter the realm of Degrees 7/8 and further distant relationships, we begin to see the phenomenon of none of the parent’s shared DNA with a relative being represented in the child’s results. With Degrees 9/10, many (but not all) of the results exhibited 0% or 100% shared DNA. At Degrees 10/11, it was either all or none proposition. It is to be noted at this level, the shared segments were between 5cM and 10cM. Since we have three generations that can be tracked lineally with these specific relationships, these segments are identical by descent (IBD), as they can be traced back to the grandparent’s much larger segment at the same position.

The rule of thumb is as follows: the closer the relationship, we are generally “the halves” – at least within one standard deviation of the half share. As for more distant relationships, it is likely we will be “halves not” – perhaps, all or nothing.


While 630 relationships may appear to be a large number, a desired number of at least 768 (384 pairs) would provide the minimum necessary sample size with a confidence level of 95% with a 5% margin of error. As with all statistical measures, a larger sample influences a greater confidence level and a diminished margin of error. A sample size exceeding 384 parent/child pairs would be greatly desired.

A second limitation is that this study is largely represented (but not totally) by the descendants of one ancestral couple. The results include those of the ancestral mother who had tested prior to her death in 2016 and includes three generations of her progeny.  Only one of her descendants failed to participate.  In all cases, the participants (including relatives not descended from this couple) have ancestries from Northern and Western Europe. A more diverse population might provide different results.

Friday, June 10, 2016

Exogenous Ancestry – Proposing a Replacement for NPE

If I were genetic genealogy king for a day, I would replace the term “Non-Paternity Event (NPE)” with a more comprehensive term – specifically, “Exogenous Ancestry.”

Exogenous ancestry? That’s a mouthful, but what does it mean?  Well, it’s a term that I have borrowed from biological studies to explain some of the discontinuity of single source surnames with Y-DNA from outside of the family in question.  I have been contemplating for some time of using a different term from what is now commonly used in genetic genealogy – non-paternity event (NPE).

Bryan Sykes and Catherine Irven (2000) first used non-paternity event in the context of genetic genealogy to explain haplotypes that differed from the typical Y-DNA signature of a surname.  It was a borrowed term as well, as it was used in anthropology and sociology where the presumed father was not the father of a child.  Generally, this referred to infidelity on the part of the mother. 

In genetic genealogy circles, the International Society of Genetic Genealogy’s Wiki cites least 13 different categories which have been considered as non-paternity events.  While infidelity is one of these, there are other scenarios where genetic genealogists have used this moniker to describe the discontinuity between surnames and ancestry.  

What's the Beef?

The term non-paternity event and its synonyms don’t neatly fit every situation where it is used.  It assumes that the designated father (and even the child) is unaware of the child's ancestry.  This is not always the case. 

In some cases, there may not be a father in the picture and the surname traveled from mother to child.  The birth father’s name was not associated with the child and there was no “official” father from whom false paternity could be claimed.  It wouldn’t be a surname discontinuity as it continued from the mother; it would be a Y-DNA discontinuity.

In the case of complete adoptions, not only would the paternity be different, but the maternity would be as well.  Using a term such as “Exogenous Ancestry” would better fit full adoption circumstances as not only is the paternal DNA different, so is the maternal DNA.  This term would be applicable to discontinuities found in mitochondrial and autosomal DNA. 

Name changes are often considered NPEs – however, these can be voluntary and NPE doesn’t fit the situation – I am not sure any term other than “name change” would fit this scenario.

Finally, the term appears to pinpoint a given “event”; however, we may not be able to identify a specific generation when this discontinuity occurred.  While a person’s recorded ancestry may have confirmation going back several centuries, Y-DNA tells a different story.  Yes, there was some sort of misattributed paternity, but where did this “event” occur in the lineage?  Can we find it – sometimes, but not always.  We know that somewhere along the ancestral line exogenous DNA entered the picture. 

Where did this Term, Exogenous Ancestry, Originate?

It isn’t an original term, although I have been sparingly using “exogenous Y-DNA” since 2012 to soften the blow when reporting NPEs in my study. While recently performing Google searches for terminology relating to DNA from outside the family/clan/tribe, I found it used in the study of wolf and coyote populations of North America. 

Lupine biologists used it to describe DNA found in certain wolf populations that originated from outside the pack – sometimes considered an unusual occurrence.  In addition, it was also used when wolf DNA was present in populations of coyotes – especially in areas where no known wolf populations existed – hence an ancestral occurrence (von Holt, Kays, Pollinger, & Wayne, 2016).

Exogenous ancestry is broader term than non-paternity events, it is already used in mammalian DNA studies, and it is a better fit to a variety of DNA discontinuities. Will it gain in popularity?  I hope, but sometimes teaching an old dog, wolf, or coyote new tricks isn’t that easy.  I would be interested in hearing your spin on this term.


Non-Paternity Event (n.d.). International Society of Genetic Genealogy Wiki. Retrieved June 10, 2016 from http://isogg.org/wiki/Non-paternity_event

Sykes, B., & Irven, C. (2000). Surnames and the Y chromosome.  The American Journal of Human Genetics, 66(4), 1417-1419. doi:10.1086/302850

von Holt, B. M., Kays, R., Pollinger, J. P., & Wayne, R. K. (2016). Admixture mapping identifies introgressed genomic regions in North American canids. Molecular Ecology, 25(11), 2443-2453.  doi:10.1111/mec.13667

Friday, February 12, 2016

He Inspired a Genealogist – Mr. George T. Ihnat

Today, I received notification that a teacher I had in junior high school and high school had passed away on Wednesday, February 10, 2016.  I hadn’t seen Mr. George T. Ihnat since the day I graduated in June 1973; however, he had a profound effect on me by instilling a love for family history.
George T. Ihnat in 1972
Beginning in 1967, I attended Park Terrace Junior High School in North Versailles, PA – where we moved from teacher to teacher instead of having one teacher all day.  I barely remember any of my instructors from Park Terrace, as there were so many – but one who made a lasting impression was Mr. George T. Ihnat who taught 8th grade English. I would later have him as my 11th grade American literature instructor at East Allegheny High School.
As I had many great teachers during my life, I can’t say I remember the specifics of the vast amounts of knowledge he imparted in either class; however, I do recall an assignment that had influenced my primary life’s interest.  One day in 1968, Mr. Ihnat assigned us a project to create a family tree – a typical project that occurs during many people’s school experiences.  I hadn’t thought about my ancestry until then and I haven’t looked back.
The assignment prompted me to ask my mother about her and my dad’s families.  Since my dad had passed away in 1962, I knew very little concerning my paternal lineage.  Mom knew my dad’s mother’s family, but only my grandfather’s name and a few scattered details about his siblings. She went into her secretary and pulled out a piece of folded paper in my father’s handwriting that had the names and dates of my father’s grandparents. He had jotted down these notes after visiting relatives in Ohio during the summer of 1960. She also found an old obituary about my great-great grandmother, Sarah Ann Jones Merriman, who was the oldest woman in McKeesport, PA at the time of her death in 1929.

Later that day, my mom and I went to McKeesport-Versailles Cemetery and found Sarah Merriman's and my second great grandfather’s grave – John Merriman was a Civil War veteran in the 101st Pennsylvania Volunteers. My research also inspired me to query my only living grandparent – my mother’s mother about her lineage. I was given a wealth of information about her and my grandfather’s sides of the family.

I also asked my Aunt Nath, my dad’s oldest half-sister who attended the same church as us, if she could provide some additional information. She gladly wrote down names of family members that she could remember. That was a little over 47 years ago and I still have all of these notes and clippings. It got me interested in family history and this was later rekindled in 1978 with the return of my great-grandparents’ family bible to its bloodline.

Mr. Ihnat’s assignment continues to inspire me even to this day in discovering family – old and new. This interest has expanded from archives, library, and cemetery research to DNA testing of relatives – a keen hobby thanks to an English teacher who went beyond the scope of grammar and composition with an assignment about a family tree.
Mr. Ihnat:  I am sorry that I never connected with your during my adult years to tell you how that one assignment changed my life forever. Thanks to you it did. While I am hard pressed to remember any of my junior high teachers, you’ll never be forgotten. Rest in Peace. 

Sunday, January 10, 2016

Case Study: Blaine Bettinger

How did you enter the field of genetic genealogy? What and who influenced you?  Were you an innovator, an early adopter, or are you still a laggard who hasn’t tested? Although, I sent in my first DNA kit in 2007, I still feel like a DNA adolescent among some of my peers. If I had to categorize my experiences, I would rank myself in the early majority.   

That first kit was inspired by the article “Shaking the Family Tree with Recreational Genetics” in Newsweek.  I saw it November 2007 at my optometrist’s office and I showed it to my wife who is adopted. Within days, Ancestry had a sale on their Y-DNA and mtDNA tests and both of us took the plunge. 

By the end of the year, I found out that my haplogroups were I1a (old designation) and H.  My wife’s mtDNA was also an H.  We were not too impressed by these results, as they told us little; however, my haplogroups confirmed what I already believed concerning these lines:  my patrilineal line was likely Norse when taken to its logical conclusion and my matrilineal line came from central Europe.  Both haplogroups pointed in these directions. To me, this was still a giant genetic leap.

During 2008, Ancestry partnered with two other companies:  Sorenson Molecular Genealogy Foundation (SMGF) and 23andMe.  I signed up for accounts at both and submitted my Y-DNA and mtDNA results to Sorenson. At that time, 23andMe only offered health and trait information for a hefty price tag ($499), so I passed on their product, as I wasn’t interested in spending that kind of money for this info.  I had a login account, but no data of my own – yet.

Fast forward to 2010.  Wanting to know more about my genetic ancestry, I subscribed to a wonderful online resource, the now defunct DNA-Forums.org, and began learning about this new service at 23andMe called Relative Finder (now DNA Relatives).  DNA-Forums also alerted me in March 2010 that 23andMe was having a month-long sale of their product with $200 off the $499 price – it was called the Oprah sale, as it had been advertised on her show.  Curious, I bit er spit and had my results in May.  I also encouraged my brothers, mother, wife, children, and cousins to test and thus began a process of collecting relatives’ DNA.  Needles to say, I was hooked. We now have 50 of our relatives tested.

That same year, GeneTree (part of the SMGF family and also now defunct) had a $79 sale on their Y-DNA-46 test and I began my surname project with six participants.  We were able to confirm that, except for those with non-paternal events in their ancestry, everyone with our surname and its variants came from a single progenitor.  This was something we couldn’t have done with traditional genealogical records as they didn’t go back far enough.

But the more I learned, the more I questioned.  I was curious about the X-chromosome, as my match to my brothers was extremely small.  So with a Google search in May 2010, I found two enlightening posts on the X at Blaine Bettinger’s blog The Genetic Genealogist.  He made it easy to understand and his fan charts were a true blessing to me and others trying to wrap our collective brains around the differences in transmission of the X among males and females. For those posts on the X-Chromosome, see the following links:  “Unlocking the Genealogical Secrets of the X Chromosome” and “More X-Chromosome Charts.”

Since 2010, a number of changes have occurred.  Ancestry no longer offers Y-DNA and mtDNA tests, DNA-Forums vanished out of thin air in the middle of the night in early 2012, and GeneTree and SMGF were absorbed by Ancestry and folded.  Gone, gone, and gone.  Several aspects of Genetic Genealogy, however, have remained constant; one of those is Dr. Blaine T. Bettinger’s blog The Genetic Genealogist. 

Just recently, I enrolled in a graduate Social Media Course at Southern New Hampshire University for professional development. This week we were challenged to write a case study on a “thought leader” who used social media.  Since Blaine’s blog was the first I encountered on the subject, I wanted to analyze his work.  He agreed and supplied some answers to very specific questions that I posed.

Blaine has influenced well over a million individuals and continues to enlighten others on a daily basis.  He has given me permission to reproduce this case study here.  I hope you learn something about The Genetic Genealogist and have a great appreciation of the power bloggers in our discipline.