Sunday, January 10, 2016

Case Study: Blaine Bettinger

How did you enter the field of genetic genealogy? What and who influenced you?  Were you an innovator, an early adopter, or are you still a laggard who hasn’t tested? Although, I sent in my first DNA kit in 2007, I still feel like a DNA adolescent among some of my peers. If I had to categorize my experiences, I would rank myself in the early majority.   

That first kit was inspired by the article “Shaking the Family Tree with Recreational Genetics” in Newsweek.  I saw it November 2007 at my optometrist’s office and I showed it to my wife who is adopted. Within days, Ancestry had a sale on their Y-DNA and mtDNA tests and both of us took the plunge. 

By the end of the year, I found out that my haplogroups were I1a (old designation) and H.  My wife’s mtDNA was also an H.  We were not too impressed by these results, as they told us little; however, my haplogroups confirmed what I already believed concerning these lines:  my patrilineal line was likely Norse when taken to its logical conclusion and my matrilineal line came from central Europe.  Both haplogroups pointed in these directions. To me, this was still a giant genetic leap.

During 2008, Ancestry partnered with two other companies:  Sorenson Molecular Genealogy Foundation (SMGF) and 23andMe.  I signed up for accounts at both and submitted my Y-DNA and mtDNA results to Sorenson. At that time, 23andMe only offered health and trait information for a hefty price tag ($499), so I passed on their product, as I wasn’t interested in spending that kind of money for this info.  I had a login account, but no data of my own – yet.

Fast forward to 2010.  Wanting to know more about my genetic ancestry, I subscribed to a wonderful online resource, the now defunct, and began learning about this new service at 23andMe called Relative Finder (now DNA Relatives).  DNA-Forums also alerted me in March 2010 that 23andMe was having a month-long sale of their product with $200 off the $499 price – it was called the Oprah sale, as it had been advertised on her show.  Curious, I bit er spit and had my results in May.  I also encouraged my brothers, mother, wife, children, and cousins to test and thus began a process of collecting relatives’ DNA.  Needles to say, I was hooked. We now have 50 of our relatives tested.

That same year, GeneTree (part of the SMGF family and also now defunct) had a $79 sale on their Y-DNA-46 test and I began my surname project with six participants.  We were able to confirm that, except for those with non-paternal events in their ancestry, everyone with our surname and its variants came from a single progenitor.  This was something we couldn’t have done with traditional genealogical records as they didn’t go back far enough.

But the more I learned, the more I questioned.  I was curious about the X-chromosome, as my match to my brothers was extremely small.  So with a Google search in May 2010, I found two enlightening posts on the X at Blaine Bettinger’s blog The Genetic Genealogist.  He made it easy to understand and his fan charts were a true blessing to me and others trying to wrap our collective brains around the differences in transmission of the X among males and females. For those posts on the X-Chromosome, see the following links:  “Unlocking the Genealogical Secrets of the X Chromosome” and “More X-Chromosome Charts.”

Since 2010, a number of changes have occurred.  Ancestry no longer offers Y-DNA and mtDNA tests, DNA-Forums vanished out of thin air in the middle of the night in early 2012, and GeneTree and SMGF were absorbed by Ancestry and folded.  Gone, gone, and gone.  Several aspects of Genetic Genealogy, however, have remained constant; one of those is Dr. Blaine T. Bettinger’s blog The Genetic Genealogist. 

Just recently, I enrolled in a graduate Social Media Course at Southern New Hampshire University for professional development. This week we were challenged to write a case study on a “thought leader” who used social media.  Since Blaine’s blog was the first I encountered on the subject, I wanted to analyze his work.  He agreed and supplied some answers to very specific questions that I posed.

Blaine has influenced well over a million individuals and continues to enlighten others on a daily basis.  He has given me permission to reproduce this case study here.  I hope you learn something about The Genetic Genealogist and have a great appreciation of the power bloggers in our discipline. 

Friday, December 4, 2015

Additional DNA Resources from the Owston/Ouston One-Name Study

Since I have a second genealogy blog dealing with my Owston/Ouston one-name study, not all of my articles on DNA are found as part of the Lineal Arboretum blog. To aid readers in finding these other posts, I have listed these below with their appropriate links.

An Analysis of Fourth Cousins and Other Near Distant Relationships – this particular post analyzes matching autosomal DNA segments of family members descending from five sons of a common ancestral couple. This study includes 10 third cousins, once removed; 5 third cousins, twice removed; 43 fourth cousins; and 34 fourth cousins, once removed.

Ancestry Composition of Three Sets of Siblings – this post looks at the differences in the predicted ancestries of siblings as reported by 23andMe. The siblings include three brothers, a brother and a sister, and two sisters.

Another F2642 Y-DNA Mutation Reported – this post reports a second, but unrelated participant, who the National Genographic 2.0 test had reported as being I-F2642. In addition, this individual had an additional downstream SNP not shared by our family.

A New Y-DNA Mutation Found in the Owston/Ouston Family – this post from 2012 that was discovered with the National Genographic 2.0 test. To our knowledge, our participant was the first to acknowledge a new I1 mutation in the Z140 family. The newly discovered SNP was that of F2642. Since this announcement in 2012, numerous men have also reported sharing the I-F2642 Y-DNA haplotype.


Saturday, November 29, 2014

It's Just Another Brick from the Wall

Ask any genealogist to list his or her frustrations and inevitably the term “brick wall” will surface in the discussions.  Brick walls are points when all clues regarding an individual are seemingly non-existent.  In most cases, these brick walls occur as we go backwards in our lineage and we reach a point where an ancestor’s identity is unknown. 

For Americans, this can happen within a few generations as record keeping was sparse, spotty, or non-existent in some locals during the 19th century.  Municipalities, counties, and states had varying degrees of public record keeping. In other words, there is no official US standard and American genealogy can be difficult at best.  

When we begin our genealogical quest, our mission of discovering each ancestor is actually a series of brick walls that will be either knocked down with extensive research or will remain solid.  Sometimes this happens with one piece of direct evidence, or it only occurs with constant chiseling with indirect documentary or DNA evidence (see example).  Not only will we encounter brick walls while seeking our direct ancestors, but we will also run headlong into the same barriers when we trace the descendents of those ancestors as well – our collateral lines.

One of My Walls

For 37 years, I’ve been searching for my great-grandfather’s sister with very little luck.  Over time, I've been able to ascertain that Frances Jenett Owston was born in Allegheny City (now Pittsburgh’s North Side), Pennsylvania in 1852, but I never could find her as an adult.  I first became aware of her existence in late 1977 when my great-grandparents’ family bible surfaced after being out of the bloodline for nearly 50 years.  Between the pages of this large presentation bible was a piece of heavy card stock with 11 locks of hair; each one was identified and dated. 

Some individuals had two samples from different periods of their lives including my great-grandfather who had one dated from 1858 when he was four years old and one from twenty-two years later.  While most of the names were easily identifiable as being members of my great-grandparents’ household, two were not.  One was a Grandma Ritchey, 79 years of age  – who was eventually determined to be my third great grandmother.  The other was for Fannie Owston; her sample was dated 1859. 

The other Fannie Owston - Frances W. Owston
For 27 years, I had assumed that this was my great-grandfather’s first cousin, Frances W. Owston, who also lived in Pittsburgh.  Since my great-grandfather’s family was musical and this Fannie Owston was a music teacher, it seemed plausible.  Confirming the identification was problematic, as I couldn’t initially find my second great-grandparents in the 1860 census.   Despite repeated searching of the census records, I was unable to find their listing until 2007. The problem was that the family was listed under an incorrect but similar surname and the head of the house’s (my second great-grandfather) initials were transposed.  See my post on this. 

Although finding the census in 2007 provided additional evidence of her existence as my great-grandfather’s sister, I was able to determine the identity of Fannie Owston three years earlier.  While browsing through the genealogical books in the Carnegie Public Library in Pittsburgh, I found Ken McFarland’s book Births, Marriages, and Deaths of Allegheny County, Pennsylvania 1852-1854.  McFarland’s a diligent researcher who has transcribed and indexed numerous records from the Greater Pittsburgh area.

While in 1980 I had previously looked at the microfilm of Allegheny County’s records from this period, I found no one in our family listed and never revisited these documents.  This time, however, I was interested in McFarland’s book, not for my own family, but for siblings of 1200 Pennsylvania Civil War soldiers that I am tracing from womb to the tomb.  I was hoping to find maiden names of the mothers of some of these soldiers. 

Click to see larger record.
As I opened the book, there it was on page one – and it was even registration number one: the birth record of Frances Owston daughter of John G. Owston and Martha N. French.  She was born at 7 PM on July 13, 1852 in the fourth ward of Allegheny City, PA.  Since this time, I’ve searched diligently for Frances Owston, but outside of the additional listing in the 1860 census, I’ve had no luck. 

The family had moved from Pittsburgh to Canada in about 1857 and was in Detroit in 1860 where my second great-grandmother died that same year.  In 1995, I had traveled to Detroit to research my second great grandparents.  While I found some information on the family, nothing on Frances surfaced.  No one else was buried in the plot where my second great grandmother was buried, so it seemed plausible that Fannie survived the family’s eight year stay in Michigan.

By 2009, I became aware of my family’s 1863 move from Detroit to East Saginaw along with my second great-grandfather’s marriage to and subsequent divorce from Permelia Condon.  This heretofore unknown tidbit was a serendipitous discovery through searching my surname in Google Books.  A published biography and photo of my second great-grandfather with information about his work in Saginaw led to the discovery of his second wife.  After the couple separated, the family moved back to Allegheny City in 1868. 

Unfortunately, Detroit, East Saginaw, and Wayne and Saginaw counties were not registering vital information during the 1860s and 1870s.  If Fannie had moved back to Allegheny County with her father and brother, chances of finding her if she married or died were marginal.  Allegheny County did not begin registrations of marriages until September 1885 and Allegheny City did not register births or deaths until July 1882. I had already checked all of these records in the past for anyone with my surname.  If death or marriage occurred before the 1880s, I might never find her.  But, I never stopped searching.

Background on the Records

While Pennsylvania is currently ranked at sixth in population, it was the second most populous state for much of its history.  You would think that a vital “keystone” of a state might have policies in place to register births, marriages, and deaths – but alas, it did not for many years.  In the mid 19th century, Pennsylvania attempted to institute registrations of births, marriages, and deaths.  This 1852 registration was unsuccessful, and the state dropped the experiment after two years.

One problem was that registration was not compulsory and many individuals failed to comply.  Frances Owston was the first to be registered in Pennsylvania’s second largest county and third largest city, but her birth occurred seven months after the law was effected and was not registered by the physician until three months later. Her brother’s birth two years later in the same town was never registered with the state.

Eventually, individual municipalities began to register births and deaths over the next 50 years.  Pittsburgh, the second largest city in the Commonwealth, began in 1870.  As previously stated, Pittsburgh’s neighbor to the north, Allegheny City, waited 12 more years to register birth and death records.  Other towns followed suit but only when it was convenient to do so. 

Additionally, none of these registrations through 1905 were mandatory.  A case in point is my father’s siblings. All five were born before 1906 and two died in early childhood during the same period; none of these events were registered even though the municipalities were actively registering births and deaths.   

The practice with marriages and divorces in Pennsylvania was different.  As of September 30, 1885, Pennsylvania required that all counties register marriages and these be on file in the local county courthouse.  Marriage registration was mandatory and the same process exists today.  Divorce records were registered with the county’s Prothonotary beginning in 1804.  Statewide mandatory vital registration, however, did not begin until 1906, which is late considering the population of Pennsylvania and that it prides itself on being the second state to ratify the Constitution.  

Fast forward

After November 2014’s election, Governor Tom Corbett may not think he has a friend in Pennsylvania, but he certainly has a friend in me, as he signed Act 110 (Pennsylvania Vital Records Bill SB-361) into law on December 15, 2011. I was one of the many people to sign several petitions over the last 10 years to hasten the Commonwealth to begin this process.  The bill went into effect on February 13, 2012 and the Division of Vital Records transferred all death certificates 50 years old and older and birth certificates 105 years old and older to the Pennsylvania State Archives in Harrisburg.  

Declaring these documents as old records made them easily available to the public and the old paper indexes for both became listed on the Pennsylvania Department of Health’s website.  The indices, however, are PDF scans of the typewritten copies and are laborious to use – but, at least, they are there. Copies of the original records are now available to anyone through the State Archives for $5.00.  This is providing you have the registration number from the indices.  From my personal experience, the turnaround of the processing takes less than a week.

How important is this move?  Being a native Pennsylvanian and an avid researcher of Pennsylvania records, this was a dream come true.  In the past, Vital Records’ processing was slow (up to a month); they could reject you if you were not a blood descendent or legal representative of the person on the birth certificate (a caveat on their forms); you were not allowed to copy, photograph, or publish an image of the record; and the service was expensive to use, especially if you simultaneously wanted numerous certificates. Prior to the transfer, a death record would cost $10, unless you didn’t know the date and then an additional search fee was of $10 was charged for a search of ten years.

While looking for my great-grandfather’s first cousin’s death record, I got stung for $50. Not knowing the date of his passing, I ordered a copy of the certificate with a 10 year window (1920-1929) search – that was $20. No document was found.  I ordered another search at $20 for the years of 1910-1919. This was also inconclusive.  Several years later while perusing a church’s records on site, I found his 1923 burial date – the cemetery provided an exact date of death.  I ordered the certificate again ($10) with the exact date and received it.  Unfortunately, Vital Records did a sloppy job on the first search and I was out $30. 

Further Movement

In August 2012, the Pennsylvania State Archives and Ancestry signed an agreement for the company to digitize and upload the records.  These would be freely available to Pennsylvania residents if they register at  All Ancestry customers would also have access as part of their individual memberships. 

On April 18, 2014, Ancestry announced that it had uploaded the images and database information on death records from 1906-1924.  As with many individuals, I began searching for family and others.  As with the database and the certificates, there were some issues that I will address in future posts.  The second group, 1925-1944, went live on June 24, 2014.  The records through 1963 completed the death certificate process on October 24, 2014.

Birth certificates for 1906 will be completed in March 2015.  No timeline has been communicated regarding the records for 1907-1909.  Until the end of this year, only the indices for 1906-1908 are currently available. 

Brick Wall Smashed

In July 2014, I decided to see if Ancestry had completed any further uploading of death certificates.  They had, and I did my customary search of my surname.  To my surprise, I found a Frances Beecher Smith who was the daughter of John Owston and Martha French.  This was my great-grandfather’s missing sister.

Click for larger version.
After doing additional research in Pittsburgh, I began to piece together her story:  two failed marriages, a bitter rivalry with another woman, the birth of two children, the loss of a grandchild, and the finding of another.  While she never owned her own home, what she did have was far more precious than gold.  She lived a long life and had the support of a family that dearly loved her and to whom she reciprocated that love.

My biggest surprise about Fannie was that she lived less than 15 miles from my childhood home.  Outside of the records I previously mentioned, I had never encountered her in any other until now.  In fact, I had walked across her grave (which is unmarked) on at least four occasions looking for others in the same cemetery.   No stone is present, and even if there was one, I wouldn’t have recognized the name Frances Smith. 

My brick wall - Frances J. Owston Beecher Smith
In addition, I have found Fannie’s only surviving great-grandchild who lived in the same home with her for two decades.   We have corresponded and talked on the phone concerning the differences and similarities in our respective families.  In addition, this third cousin turns out to be a double third cousin as I am related to both her maternal grandparents.  We still have a lot of catching up to do yet.

Thanks to the 48 members of the Pennsylvania Senate and 194 members of Pennsylvania House of Representatives who voted to pass this act, to Governor Tom Corbett who signed the bill into law, and to the forward thinking folks at the Pennsylvania State Archives and Ancestry for collaborating on this important project.

I have already viewed several hundreds of these certificates and in our next installment I will deal with death records, primarily those from Pennsylvania, and their importance as genealogical evidence and the inherent problems regarding these records as sources of information.  


References (2014).  Pennsylvania, Births, 1852-1854. Database available at (2014). Pennsylvania, Death Certificates, 1906-1963. Database available at

Gruber, T. (2014).  People for Better Pennsylvania Historical Records Access (PaHR-Access): Frequently Asked Questions.

Gruber, T. (2014).  People for Better Pennsylvania Historical Records Access (PaHR-Access): Genealogists, Researchers, Family Historians.

McFarland, K.T.H. (1999). Births, Marriages, and Deaths of Allegheny County, Pennsylvania 1852-1854. Apollo, PA: Closson Press.

Pennsylvania Department of Health. (2014). Act 110 – Public Records (formerly known as Senate Bill 361).

Pennsylvania General Assembly. (2012). Senate Bill 361; Session 2011-2012.

Pennsylvania Historical & Museum Commission. (2014). Vital Statistics Records at the Pennsylvania State Archives.

Thursday, June 19, 2014

Is Genetic Distance an Adequate Predictor of Relationships?


While obviously having a small pool of potential Y-DNA participants, low frequency surnames may have the advantage of having good documentation of ancestry. That is the case with my surname and its corresponding Y-DNA Project. The original intent of our project was to see if three families bearing our surname from the original East Riding of Yorkshire had a common ancestry or if the surname was applied to these lineages independently from each other.

The three families are as follows:
SHERBURN: The largest family hails originally from Sherburn in Hartford Lythe and its many members descend from Peter Owston who died in 1568. This group also includes the Ouston descendants of James Ouston (1711-1785) who was born in Brompton by Sawdon and who died in Sigglesthorne. You’ll find members of this clan in the UK, Australia, USA, Canada, New Zealand, Netherlands, France, and the UAE.
GANTON: The second largest group of Owstons originated in the village of Ganton and can be satisfactorily traced to Giles Owston who died in 1641. While an older connection cannot be firmly established, it is probably descended from John Owston who was alive circa 1490 in nearby Staxton in Willerby. This supposition occurs because several unique first names exist in both lineages. While very few Ganton Owstons live in the UK, the majority of Owstons in the US are from this family. All surviving Ganton Owstons descend from Thomas Owston (1755-1823). Ganton, like Sherburn, is now located in North Yorkshire.
THORNHOLME: Finally, a third group of Owstons from 15 miles south of Sherburn and Ganton can be traced to Richard Owston of the village of Thornholme in the parish of Burton Agnes. Richard Owston died in 1739. By using onomastic evidence, it is possible to theorize a connection to an earlier Ganton line fathered by Robert Owston who was born as recent as 1580. The Thornholme Owstons constitute the largest group in Canada. Others descended from this lineage live in Australia, the UK, Finland, and New Zealand.
In the study’s first year, a positive conclusion was reached; as three participants (one from each family) matched each other at 100% using 43 Y-DNA marker tests from GeneTree. Others in the study matched at a genetic distance of 2 and 3.

This was exciting news as it was impossible to determine a relationship between these lineages as the connection between these three families apparently occurred before the introduction of English parish registers in 1538 (however, many of the nearby parishes do not have extant registers until much later; Burton Agnes' earliest register is 1700). The first record of the surname in the region (spelled as Oustyn) appeared in a 1452 will from the parish of Wintringham.

In order to better understand our relationships and to construct a more conclusive modal haplotype of the Owston families, it was necessary to branch out beyond our original participants and attempt to test as many Owston/Ouston males as possible. We have identified 23 lines from the three families. Some lines can further be subdivided into groups that we call segments. There are 39 lines and segments.

Currently, the Owston/Ouston Y-DNA project has 26 participants – 17 Sherburn family members, 4 Ganton Owstons, and 5 Thornholme participants. The participants represent at least one person from 20 of the 23 lines and 22 of the 39 lines and segments. Additionally, some lines/segments have more than one participant. We also intend to move those who matched the Owston modal I1 haplotype at the now defunct GeneTree to retest at 37 markers at FTDNA. Eventually, all matching individuals will be moved to 111 markers. Currently, eight former GeneTree customers will need to be retested.

Of the 26 participants, four individuals are awaiting test results. Eight of the remaining individuals failed to match the modal haplotype and are apparently the results of non parental events (NPEs). Several of these participants descended from families where known NPEs existed, while others’ results were a complete surprise. It is to be noted that everyone who tested had a clear genealogical line to one of the three original families.


Currently, 14 of the participants have a solid match to the modal haplotype. These represent two Thornholme participants, three Ganton participants, and nine Sherburn participants. Early in this study, we noticed that individuals were more inclined to have closer genetic matches with individuals who were genealogically more distant than those who were more closely related. This was a curiosity that led to the eventual writing of this post.

Over the years, genetic genealogists have tended to rely upon genetic distance to help predict a range of possible relationships. In fact, FTDNA qualifies matches at various levels of genetic distance.

For example, FTDNA states that a GD=0 at 37 markers indicates that the two individuals are “very tightly related”; and with a confidence level of .05 or less, these individuals are related within eight generations (seventh cousins). A mismatch of one GD is considered “tightly related.” Genetic distances at 2 or 3 marker differences between men of the same surname are identified as “related.” As GD increases, the likelihood of a relationship diminishes with a GD=6 as considered as being not related, even when the same surname is present (Canada, 2011).

In addition, most of us have genetically close matches with individuals who obviously were further back on the relationship continuum and do not share a common surname or a variant surname. At 37 markers, I have a number of matches with individuals whose ancestry derives from distances far removed from my own East Riding ancestors. While we are related, it is obviously far beyond the genealogical time frame and may be prior to the various invasions of Britain – one of which brought my ancestors from mainland Europe.

Using my project as a case study, I have hypothesized that, although a predictor of a familial connection, genetic distance is an inadequate predictor of relationships. Before I discuss my results, I must present some caveats.

First of all, I cannot affirm an exact connection between the three families in my study; however, I have constructed plausible trees based on shared forenames, typical naming conventions, names found in wills and other local records, and the close geographical distances among all three current families and two earlier extinct families. Currently, we can only affirm the relationship intra-family; however, based on the aforementioned factors, we are confident that the supposed relationships are close to the unknown actual relationships.

Secondly, not all 14 matching participants tested at the same level of resolution. Eight tested at GeneTree with 43 markers. Four individuals tested at FTDNA at 37 markers. Finally, two individuals tested at both GeneTree (43) and FTDNA (minimum of 37).

While GeneTree’s 43 marker test shares 32 markers with FTDNA’s 37 resolution test, it was decided to compare apples to apples and oranges to oranges. Eliminating the five makers that GeneTree did not test would disregard marker differences from five of the six FTNDA participants. Eliminating the 11 additional markers tested at GeneTree would eliminate marker differences in two of the ten GeneTree participants.

Therefore, instead of comparing all 14 matching individuals, the ten who tested at GeneTree were compared to each other and six FTDNA participants were compared to each other. While this is not a perfect scenario, it does allow for a comparison of 59 (supposed) relationships. Within the entire study, there are a total 325 relationships among all participants. These include those whose results have not yet been returned and the eight additional participants with NPEs.

Third, since we have not secured many participants with close relationships (fifth cousins and closer) to other participants, the actual (and supposed) relationships skew more distant. Currently, we have only five of the 59 compared relationships at the fifth cousin level or closer. The relationships are as follows:

 2nd Cousins1
 2nd Cousins, Once Removed1
 4th Cousins, Once Removed1
 5th Cousins1
 5th Cousins, Once Removed1
 6th Cousins, Once Removed1
 7th Cousins, Once Removed1
 8th Cousins3
 8th Cousins, Once Removed1
 9th Cousins4
 9th Cousins, Once Removed3
 9th Cousins, Thrice Removed1
10th Cousins, Twice Removed8
11th Cousins, Once Removed1
12th Cousins, Once Removed4
12th Cousins, Twice Removed2
12th Cousins, Thrice Removed3
13th Cousins2
13th Cousins, Once Removed6
13th Cousins, Twice Removed9
14th Cousins, Once Removed2
15th Cousins3

The genetic distance of the 59 relationships range from 0 to 6. As expected, when the number of participants for each genetic distance (GD) is compared, the plot almost resembles a normalized curve. Both genetic distances of 2 and 3 have 14 relationships each and dominate the center of the chart. At least in our Y-DNA project, the average genetic distance among participants appears in the neighborhood of a difference of 2 to 3 markers.

However, when we look at the relationship ranges with each genetic distance, the results are all over the road. The results for two individuals having a GD of 0 and two individuals having a GD 5 are indistinguishable. There is no rhyme or reason for the results. Randomness abounds.

Additionally, the relationships can be quantified with degrees of relationship. Degrees of relationship (DR) are calculated by totaling the number of generational steps to a common ancestor by both parties and adding the two numbers together. For example, two second cousins each have three generational steps to their common ancestor; added together, two second cousins have a DR=6.

An easy way to calculate DRs would be to take the number of consanguinity, double it, and add two. For example, sixth cousins have a DR=14. For each step removed from the common generation, add a one. Sixth cousins, once removed have a DR=15 and sixth cousins, twice removed have a DR=16. A pair of seventh cousins also have a DR=16.

By quantifying the degrees of relationship, the results indicate that there is no significant difference in relationship between our participants who have a GD=0 and a GD=5. We’ve eliminated the GD=6, as only one relationship (tenth cousins, twice removed) is represented.

073024.5411th Cousins, Once Removed
163125.5012th Cousins
2123023.6411th Cousins
3183125.5712th Cousins
4113023.1110th Cousins, Once Removed
5172924.5011th Cousins, Once Removed

Notice that the results across the board are not significantly different. A GD=0 and a GD=5 result in the same adjusted mean relationship: 11th cousins, once removed. According to these results, genetic distance is an insufficient predictor of relationship range.

Although we have reached this conclusion, this is just one study and the results may only be indicative of this particular surname. I would be curious in knowing if others can replicate similar results in their studies.

Secondly, the greatest limitation on this study is the lack of the five additional markers offered by FTDNA. Four of these, DYS576, DYS570, CDY a, and CDY b are more likely to show differences. It is expected that when these markers are added, genetic distance will increase for several of the GeneTree participants. Two FTDNA participants had differences with CDY. Three of the FTNDA participants registered one marker differences from the modal result of 19 on DYS576.

What is interesting is that two ninth cousins, once removed had an exact match at 37 markers. The common ancestor for these two individuals was born in 1598. Each carried a mutation on DYS576 with 20 repeats. While on the surface it would seem that these two participants were “tightly related,” the mutations, however, were independent of each other and were not shared by closer relatives to either party: a second cousin of one of the participants and the fourth cousin, once removed of the other.

In addition, the second cousins had a GD=1 and the fourth cousins, once removed had an unusual GD=4. By FTDNA’s explanation, the ninth cousins, once removed appeared to be “very tightly related”; however, the fourth cousins, once removed would only be “probably related.”

While we may never know what causes the frequency of mutations on Short Tandem Repeats, the examination of studies where ancestries are documented may help us to better understand the role that genetic distance plays and be able to better understand relationship prediction.


Canada, R.A. (2011). If two men share a surname, how should the genetic distance at 37 Y-Chromosome STR markers be interpreted? Family Tree DNA.

Sunday, May 26, 2013

Non Genetic Relatives In A DNA Database

Finding a relative that doesn’t match my DNA in a DNA database? That’s preposterous, isn’t it? One of the reasons many of us have chosen to test with 23andMe and other companies is to find unknown relatives that match our DNA. True, but we also have relatives with whom we have no matching DNA, and some of these should appear in 23andMe's database. If you have tested other relatives, you have an opportunity of finding some of these non-matching individuals. Indulge me while I explain.

While I was allowing my mind to wander concerning autosomal testing the other day, I came to the realization that I have relatives with whom I have no significant genetic connection, and that these individuals may be found in 23andMe’s database. I’ve really known this for a while, as I have tested a number of close family members and they have matches that I don’t have. No doubt I am related to some of these individuals. Because of recombination, the amount of DNA we receive from a given ancestor is diminished in each generation. Therefore, not everyone who is legitimately related to us shares DNA with us.

This is best understood with close relatives. Since I’ve tested my mother and two brothers, approximately half of their matches do not share any segments of DNA at 5cM or higher with me and therefore do not show in my “DNA Relatives” (formerly “Relative Finder”) on 23andMe. In other words, for every 50 matches I share with my mother, theoretically she has 50 matches that I do not share. The same goes for each brother.

In addition, I’ve known since I began testing fourth cousins, that some would match me and some would not. I am related to these individuals, but we simply do not share any DNA. For example, I have 6 fourth cousins and 1 fourth cousin, once removed who have tested. I share no significant amounts of DNA with the fourth cousin, once removed and none with 4 of the 6 fourth cousins. Just because I do not match some of these cousins, it doesn’t mean we are not related. There is only a 45% chance that you will match a known fourth cousin.

I then realized that, although I was not matching these relatives, there were others they matched who were possibly related to me. So the first step in this process was to determine the percentage of possible relatives a particular individual might share with me and then calculate the mean number of that person’s matches with which I might share DNA.

While the percentage of total ancestors I would share with a person would be absolute, the percentage of a person’s matches I would share with an individual would be an approximate. Since there is randomness in recombination, I might share more or less than an average amount with a person.

For example, a sibling will share an average of 50% of his or her DNA with other siblings. I share 49.6% of my genome with my oldest brother, which is fairly close to the average of 50%; however, my other brother and I share only 41% of our DNA – which is considerably less than average. Therefore, I could expect that I have lesser number of DNA matches with this brother than I do with our oldest brother.

The second factor is due to nature of 23andMe’s database. There will be a lesser number of potential matches from ancestral populations that are not well represented. For example, I have about 30% German ancestry from my mother’s family; however, I have a smaller number of German matches than I do through my father’s New England Colonial family. Therefore, predicting the number of actual non-matching relatives we have in the database is entirely dependent upon 23andMe’s customer base.

Therefore, no amount of calculation will be correct in regards to the two issues of the random nature of recombination and database limitations.

The following chart explains the percentage of a relative’s ancestors we have in common and the average amount of DNA we should expect to share.

Relationship Shared Ancestry Average Shared DNA
Identical Twin100.00%100.00%
Double Cousin100.00%25.00%
Great Grandparent100.00%12.50%
Great Aunt/Uncle100.00%12.50%
Half Sibling50.00%25.00%
Half Aunt/Uncle50.00%12.50%
1st Cousin50.00%12.50%
Half Cousin25.00%6.25%
1st Cousin, Once Removed25.00%6.25%
2nd Cousin25.00%3.13%
Half Cousin, Once Removed12.50%3.13%
2nd Cousin, Once Removed12.50%1.56%
3rd Cousin12.50%0.78%
4th Cousin6.25%0.20%

In the above table, the numbers in the second column represent the absolute percentage of that person’s ancestors we would actually share. For example, a first cousin, once removed seems high at 25%; however, if we consider that we share 100% of the ancestors of our parents’ siblings, and we do, then an aunt/uncle’s child (a first cousin) is half that amount at 50%. Therefore, our first cousin’s child shares 25% of his or her ancestors with us. We would be related to 25% of our first cousin, once removed’s ancestors and approximately 25% of that person’s potential relatives.

The exact number of relatives that we share will vary due to the number of actual relatives that person has. For example if we have 20 first cousins on our father’s side and only two on our mother’s side, we will have more paternal relatives than maternal, so the numbers only work in theory. 

The third column represents the average amount of DNA we share with our relatives at each level. Using our first cousin, once removed again as an example, we only share 6.25% of our DNA with this relative – therefore, we should match about 6.25% of that cousin’s 23andMe matches. Again if all things were equal (and they are not), we might expect to be related to 18.75% of that person’s matches, but not share any DNA with these individuals. Again, these percentages are not absolutes, but approximations.

With those with whom we share 100% of their ancestors, determining who is related to us in their DNA Relative list is easy – it is everyone; however, it is possible to determine who may be related to us in the lists of relatives that share a fraction of their ancestors with us.

To do this, we must compare two relatives that are mutually related to us and each other and find out who they match and determine who matches us and who does not. The relationship of the two individuals must not be any closer than our relationship with either one. For example, I have two fourth cousins who have tested and who are siblings. Their relationship as siblings will produce matches from ancestries for which I have no relationship. Likewise, two other fourth cousins who are second cousins to each other cannot be compared either without producing matches to individuals with whom I could not possibly be related.


The procedure for doing this is to download DNA Relative files and compare these in Excel. Find two relatives that have at least an equal or greater distance from each other as your closest relative of the two does to you. Create a column in one of the files that you mark each row in the same manner; I used an “x” and give it a name like “ID.” Add a column to the second file and leave the column blank. Copy the second file and paste the rows below the first.

Once done, highlight the header row (Row 1). Click the “Data” tab and select “Filter” (or use Ctrl+Shift+L).  Highlight the "Name" column, select the drop down arrow and select “Sort A to Z.”  The names across both data sets will be in alphabetical order.

Next, highlight the “Name” column. Click the “Home” tab. Click “Conditional Formatting” and then “Highlight Cells Rules.” Select “Duplicate Values.” Some of your cells in this column should become highlighted in the default color – generally pink.

Open a new Excel file, copy the header row into it, and keep the file ready. Go back to the original file’s “Name” column and click the down arrow on the header. Select “Sort by Color” and select the color of the duplicates. This arranges all of the duplicates in alphabetical order. Copy all of the duplicates (highlighted in pink) and paste these into the new file. Now delete all of the rows that have something in the name field.

Repeat the process, with the “Family Surnames” column and then the “Family Locations” column. You can work through the rest of the copied databases, but unless you are very familiar with Excel, I would suggest not doing this as you will have the possibility of introducing non-matches in your final spreadsheet by accident. In the new spreadsheet, go to the “ID” column, select the down arrow and “Sort A to Z.” Delete all of the rows with no “x” in this column and that eliminates all of the duplicates in the spreadsheet.

What remains are a portion of those who the two relatives share and who may be related to you. You can paste your data onto this spreadsheet, and repeat the duplicate values, as you did with “Name,” for “Family Surnames” and “Family Locations.” If there are any duplicates, these are the individuals that you and your two cousins share. If it is a short list, the quickest method is to open your DNA Relatives page on 23andMe and use the search feature there.  Good luck. 


There is no guarantee, however, that these individuals are actually related to you or related to you in the same lineage as to those with whom you are comparing. A case in point, my half cousin (through my father’s mother) is related to my fourth cousin (through my father’s father) with 14cM shared; however, they are related in an unknown manner and from different lines than I am related to either one of them.

My sister-in-law also shares 9cM of DNA with this same fourth cousin, albeit, it is via a completely different lineage. Both of these relatives share more with my documented 4th cousin that I do (5cM). Therefore, just because your two relatives have a common DNA cousin, this individual may not be related to you or if they are related to you, they may be related along another lineage.

There will be qualifying individuals that you will not be able to determine if they match both subjects.  I have determined these matches based on three criteria: name, family surnames, and family locations - these are the only three columns that will consequently prove the individual on both lists is the same.  Just because two individuals have the same haplogroups does not mean they are one and the same.  Therefore, we will miss some of our matching relatives by virtue of a lack of information.

Finally, we may match the same individuals; however, we may match on a different segments or even different chromosomes than than the other two individuals.  The closer the relationships, the more likely we will have matches on identical segments or portions of identical segments.


I decided to put this into practice and here’s what I was able to discover with several of my family members who have tested.

First Cousins

I looked at the matches between my two maternal first cousins who are also first cousins with each other. They had 93 matching individuals with only six matching me. Because of my very close relationship to both of these women, I would venture to say that most if not all of these matches are my non DNA relatives. In comparing the 93 individuals with my mother, she matches all but 20 – these twenty relatives would never have been known to me had I not conducted this exercise.

Multiple Relationships/First Cousins Twice Removed

This is a fairly unique comparison as I am related to both individuals in two different ways; however, the lineages are the same for both. With subject “A,” I am her second cousin via her grandmother and her second cousin, once removed through her grandfather. To visualize this, our grandmothers were sisters. Her grandfather was my grandfather’s uncle. I share 50% of her ancestors and she shares 37.5% of mine in this unique relationship; together we share 5.34% of our DNA.

To Subject B, I am his second cousin, twice removed via his 2nd great grandmother and a third cousin, once removed through his 2nd great grandfather. I share 12.5% of his ancestors and together we share 2.68% of our DNA. Subjects A and B, who are first cousins, twice removed, share 12.5% of their ancestry and 3.64% of their DNA.

Confused yet? Good. This particular matching produced 19 matches and only three of whom I share any DNA at a significant level. The great difficulty in determining which side these relationships originate can only come through comparing to others that are related to me and these two cousins from one side of the family and not the other.

Fortunately, I have two individuals from each side of the family to do this comparison. Unfortunately, the two related to me through my paternal grandmother did not match these 19 people. The two subjects related to Subject A and B through my paternal grandfather’s mother each matched one individual. From the 19 matches, only two can be placed into a specific lineage. The others will require further research.

Half Cousin/Second Cousin

Since my father had half-sisters and no full siblings, I tested my half cousin who is also a second cousin to Subject A in the preceding example. Their match, which is completely New England Colonial, produced 17 individuals in common – seven of these match me.

Half Cousin/Second Cousin Once Removed

Using this same half cousin, I repeated the exercise with our common second cousin, once removed. His great-grandmother was the sister of our grandmother. By comparing the two individuals, I was able to determine that they had 12 individuals in common and four of these match me.

Second Cousins

Although we share no Colonial New England lines we have some Colonial New Jersey and Pennsylvania lineages, my common paternal second cousins produced a sizable number of matches. The three of us share great-grandparents, and we descend from three of their five children. These two cousins had 28 matches in common – four of which I shared. I would have expected that we would have shared more, but we don’t for some reason.

Second Cousin/Second Cousin Once Removed

The next comparison was between my second cousin and my second cousin, once removed.  They are second cousins, once removed and share a Colonial New England ancestry.  Both descend from my grandmother's two sisters.  The two share 10 matches in common with only three that matched me.

Second Cousins/Fourth Cousins, Once Removed

My second cousins and I share a fourth cousin, once removed, and since one of the second cousins shares a larger than normal amount with her, I thought it might prove interesting if we see how many they match; some of these individuals could be my relatives as well.

Needless to say, the pair matched four individuals. One of these is a known fourth cousin. Because I do not share any autosomal DNA with this fourth cousin, once removed, I have no matching DNA with any of the four – including the other fourth cousin – who also matches her. Of the 16 folks in our family study based on my surname lineage, only these two women match this fourth cousin, once removed.


What I’ve learned from this exercise is that I can possibly discover non-DNA relatives in 23andMe’s database. While there is always a chance that the matches that my cousins match are not related to me personally, there are those who would be.

While many of the matching segments are very small, I probably will not pursue some of these personally, as I’ve found that trying to determine relationships with the smaller matches has been an exercise in futility. While I will share with anyone who wants to share, I am currently only actively tracking individuals that share at least .20% of their genome with my family members.

This was a fun exercise and it renewed my interest in my 23andMe matches. I’ve become a little stagnant with my autosomal pursuits as of late. This effort has infused a bit more vigor in this regard. Try it and see if you can find possible non DNA relatives in the 23andMe database.

Wednesday, March 6, 2013

Who Do They Think I Am - A Look At Four Autosomal Analyses

Since a number of genetic genealogists have already participated in the exercise of analyzing their results from the various autosomal companies, I have decided to look at mine as well. To see what others have discovered, see the posts by CeCe Moore and Roberta Estes. In my analysis, I will only look at the results from four commercial entities that provide autosomal results: National Geographic’s Geno 2.0 Project, Family Tree DNA’s Population Finder,’s AncestryDNA, and 23andMe’s Ancestry Composition.

Each of these four companies provided different results and I will compare these in light of what I know concerning my own ancestry from the last 500 years. Attempting to assign a person to a population is less difficult for someone who has a homogenous ancestry than it is for someone who is admixed from divergent populations. Some of the services will assign a primary population while others look at the constituent parts of one’s genetic background and provide an analysis of the segments.

Regional Populations

Another problem in comparing the results is that the various companies use different reference populations. In addition, regional populations are not consistent. For Europe (where all of my known ancestors hail), 23andMe classifies four regional populations: Northern European, Southern European, Eastern European, and Ashkenazi.

FTDNA’s European regions are identified as Western European, Northeast European, Southeast European, and Southern European. AncestryDNA features more European regions and these include British Isles, Scandinavian, Central European, Eastern European, and Southern European.

The Geno 2.0 project assigns local results based on a mixture of a variety of world regional populations with only two that are predominantly European in origin: Mediterranean and Northern European; however, the Mediterranean segment classification is not limited to Europe. With only two regional populations assigned to Europe, it is difficult to compare the Geno 2.0 results with the others – but we will get to this later.

My Ancestry

To the best of my knowledge, the following chart illustrates the nature of my ancestry within the last several hundred years. While I can take some lines back to the 1500s and beyond, others can only be traced satisfactorily to the early 19th century.

Primarily, I am English (38.28%) and German (31.25%). Scottish, Welsh, and Swiss are represented by each constituting 6.25% of my ancestry. My Scots-Irish, Irish, and French ancestries each contribute 3.13% of my lineage. My French ties come from the former province of Dauphiné in southeast region of the country.

Finally, my least represented known ancestry is of Norman stock from the Isle of Jersey. Two New England families on my father’s side constitute this lineage. My Gustin (formerly known as Jean de la Tocq) line and associated families are from St. Ouen’s parish and my Gavitt/Gavey line and related families hail from St. Saviour’s parish.

While I do not have any personal knowledge of Dutch ancestry, there are a number of residents of the Netherlands that match my mother on 23andMe with percentages that are consistent with third and fourth cousins. The origin of these connections has not yet been determined, but probably will show as one of my lines previously believed to be German. In addition, it is thought that my Maneval line, which originated in Dauphiné, may have intermarried with Italians in Piedmont.

National Geographic’s Geno 2.0 Project

In the Geno 2.0 project, the various reference populations are viewed from their specific admixture. Since my ancestry is European, we’ll concentrate on those references for this discussion. There are 12 reference populations from Europe and include the following ethnicities: British, Bulgarian, Danish, Finnish, German, Greek, Iberian, Romanian, Russian, Russian Tartars, Sardinian, and Tuscan.

All of the above populations have Northern European, Mediterranean, and Southwest Asian components. Certain populations (Bulgarian, Finnish, Romanian, Russian, and Russian Tartars) also carry segments from Northeast Asia. Depending upon the reference population’s geographic location, the majority of the segments were either Northern European or Mediterranean.

Mediterranean is also found as the majority component in the following non-European populations: Egyptian, Georgian, Iranian, Kuwaiti, Lebanese, Northern Caucasian, Puerto Rican, and Tunisian. Mexican-Americans also have a sizable Mediterranean component; however, Native American is their greatest percentage.

Other regional reference markers that are not found in the European reference populations are Southeast Asian, Native American, Oceanian, Subsaharan African, and South African. For an overview of the reference populations used in this study, go to

For my results, my Northern European component at 41% is less than the Northern European reference populations of Finnish (57%), Danish (53%), Russian (51%), British (50%), and German (46%). My Mediterranean component (39%) is greater than that which is found among German (36%), British (33%), Danish (30%), Russian (25%), Russian Tartar (21%), and Finnish (17%). It is also considerably less than more southerly European populations from Sardinia (67%), Tuscany (54%), Greece (54%), Iberia (48%), Bulgaria (47%), and Romania (43%).

Since European populations also have Southwest Asian genetic components, my 19% is slightly higher than most of Geno 2.0’s European reference populations; however, it appears to be more closely aligned with Eastern Europeans such as Russians (18%), Romanians (19%), Bulgarians (20%), and Russian Tartars (21%); however, I do not have any Northeast Asian markers, which are characteristic of all of these populations.

I have included a chart of four reference populations compared to my results. Included in those four are the primary (German) and secondary (Tuscan) reference populations as determined by Geno 2.0. I have added two additional populations (British and Danish) for comparison purposes.

Geno 2.0 lists German as my primary reference population. I am in agreement with this as I have a large percentage of German ancestry and an even larger percentage of English. When one remembers that Saxon, Angle, Jute, Frisian, Viking, and Norman invasions occurred on British soil, Germanic segments would have contributed greatly to this ancestry.

According to Geno 2.0, “This reference population is based on samples collected from people native to Germany. The dominant 46% Northern European component likely reflects the earliest settlers in Europe, hunter-gatherers who arrived there more than 35,000 years ago. The 36% Mediterranean and 17% Southwest Asian percentages probably arrived later, with the spread of agriculture from the Fertile Crescent in the Middle East over the past 10,000 years. As these early farmers moved into Europe, they spread their genetic patterns as well. Today, northern and central European populations retain links to both the earliest Europeans and these later migrants from the Middle East.”

Geno 2.0’s secondary population for me is Tuscan. Even eyeballing the results tells me something is amiss. While I have a Mediterranean percentage that is larger than the Northern European references, it is not comparable with those from Tuscany. I have included British and Danish references in the above graphic and they appear to be more in line with secondary and tertiary populations.

If I were to score the populations based on the total percentage differences of the three categories of Northern European, Mediterranean, and Southwest Asian, the Tuscan reference is not as close as the British and Danish references. I have a total point difference of 10 with the German reference; however, the Tuscan population has a 30 point spread.

British, which is logical from what I know of my own ancestry, only has 16 points of difference, while Danish is further removed with 24 points of difference – still less than the Tuscan example. While I would be in agreement with the Germanic identity, I am not in agreement with the comparison to Tuscan populations.

Family Tree DNA’s Population Finder

When I first received my Population Finder results, I immediately dismissed these because of the inclusion of 8.42% Middle Eastern ancestry. My Western European ancestry was reckoned as being 91.58%. Knowing that my lineages were all European, I could not see where Middle Eastern segments could exist within the past 500 years; any Middle Eastern ancestry would certainly been too far removed to show in my analysis. Since receiving the Geno 2.0 results and seeing how pervasive Mediterranean and Southwest Asian segments were across all European populations, I have rethought my original opinions on these results.

Since populations are more complex than I originally thought, I am more inclined to view the Middle Eastern segments as either part of what is identified by Geno 2.0 as either Mediterranean or Southwest Asian in origin. This remains to be seen and since neither service provides a chromosome by chromosome analysis, it is impossible to see if there is a correlation.’s AncestryDNA

Ancestry’s analysis has me baffled, as they have assigned 21% of my ancestry to Eastern Europe. While I have a slight amount of my overall lineage traced back to Ukraine, it was 38 generations in the past and its overall impact on my autosomal results should be negligible.

Although not all of my ancestors are represented by the pins on the map shown below, the ones that are present show the predominance of my heritage coming from the British Isles and Central Europe. None are found in Eastern Europe. While I would love to lay claim on some recent Slavic ancestry, I cannot and I question the results as reported by Like Geno 2.0 and FTDNA’s Population Finder, Ancestry does not plot the results by chromosome.

23andMe’s Ancestry Composition

Introduced in December 2012, I will have to admit that 23andMe’s new feature is far and above the competition in accuracy based on my known ancestry. In the previous incarnation called Ancestry Painting, 23andMe’s ancestral analysis was pretty Spartan. My results were, in a few words, pretty vanilla – or in the color schemes used at the time – completely blue.

The new Ancestry Composition feature fine tunes these results with additional global populations going beyond their original European, Asian, and African classifications to the expanded European, East Asian/Native American, Middle Eastern/North African, South Asian, Sub-Saharan African, and Oceanian regional populations. In addition, several sub-regional populations were also added.

23andMe also defaults to a standard estimate of your populations and allows you to determine if you want to be more speculative or more conservative in your population estimates. I’m ready to go for broke (read “reckless abandon” for me) and completely rely upon the speculative results as it gives me more options.

While these results remained 99.7% European, some additional colors were added to my ancestral spectrum. These were very small by comparison with 0.1% each for Native American, North African, and South Asian. The results could be regarded as noise or just very small segments of my ancestry.

With two of these populations occurring at the same segment as my mother, I have a tendency to believe that they may be accurate – but very persistent and fairly distant markers. She shares the Native American and North African segments. Therefore, the Southeast Asian must come from my father. How they fit into my ancestry, I haven’t a clue. I would have thought that my father’s side had more of an opportunity to have Native American blood, as a majority of his ancestors were in the colonies over a hundred years prior to my mother’s first immigrants.

One thing that I believe is incorrect is the assignment for my X chromosome as being “British and Irish.” Having already phased my X as coming primarily from my maternal grandmother (see my previous post on this subject), I already know that her ancestry was 87.5% German and 12.5% French; however, the contributors to her X chromosome were all German.

Outside of this misidentification, I am pleased with how 23andMe assigned the various populations. My German ancestry is somewhat underreported; however, I am assuming that most of what came from my Teutonic predecessors is found under the “Nonspecific Northern European” category. While not having any known Sardinian (0.3%) or Balkan (0.2%) ancestry, I checked with my mother’s results and found that she only shared the Balkan markers. She also had a chromosome that was nearly all Italian which I did not inherit. This may indicate the supposed Italian ancestry from Piedmont.

The Sardinian must come from my father; however, his ancestry was primarily from the British Isles. There is one possibility though. My grandfather’s sister’s middle name was the Italian surname of Marcelli. Unfortunately, we have no clue why the second child of this family was named Essa Marcelli Owston. Was she named for a Sardinian or Italian ancestor or a friend of the family? Of my great grandparents’ five children, this is the only name that cannot be traced to a family member or a friend of the family. Alas, this is another mystery that hopefully can be solved at some time in the future.


Of the four autosomal services, I would have to say that 23andMe has the best analysis and it lines up closely with my known ancestry. It is the only service that drills down to the sub-regional populations and gives you the opportunity to speculate or be conservative about the analysis. It is also the only service that provides a chromosomal analysis. With the current price at $99, if you are looking for an inexpensive ancestral analysis, 23andMe is the route to consider.