Saturday, July 10, 2010

Ancestry Finder: It's a Small World After All

Well, this week 23andMe opened up the floodgates and provided its customer base complete access to its latest Ancestry Labs feature: Ancestry Finder (AF). It is amazing how people are reacting to this feature – some with joy and elation and some – well, their reactions were a little strange. In fact, some of these run the gamut of Elizabeth Kübler-Ross’ “Five Stages of Grief”:

Denial: “My ancestors could not have come from the country of Outer Mitochondria. It is simply not possible. I know my ancestry and this is absolutely wrong. I don’t care what 23andMe says – it is wrong.”

Anger: Everyone who is an ethnic Chromosomian needs to be upset that their ancestor’s place of birth is used and not our Chromosomian heritage. This is unfair – just because they were born in Autosomia doesn’t make them Autosomians. They were Chromosomians and proud of it.”

Bargaining: “There are only a few matches that are from Mendeland. Please 23andMe, fix it – please fix it soon. If you do, I’ll be a loyal customer.”

Depression: “I am so upset because my ancestors appear to be from Centimorgana. This is going against what I’ve known from my grandparents and my research. I don’t even know who I am these days.”

Acceptance: “They say the DNA doesn’t lie. I’ll go ahead and acknowledge my new heritages. I’m not sure I understand it, but I guess that’s me.”

Well, some of the above responses are paraphrased from real customer comments and the others are fabricated; however, people are vocal about some of the ancestral implications. Most of this is based on a lack of understanding of the tool and how it works, a myopic view of one’s ancestral possibilities, and a over confidence of their own knowledge of their heritage.

I will have to give kudos to 23andMe for developing this tool as it is interesting and fun to use. Since my last report during the beta test in which I participated, they have adjusted the default settings from 10cM to 7cM, which is the same as their Relative Finder feature.

They also changed their wording, from “Did you ever wonder if you were a quarter Irish?” explanation to the more descriptive “Let the 23andMe Community help you discover what countries your ancestors and their relatives might have lived in. This lab is fueled by your responses to the "Where Are You From?" ancestry survey.” The key to this description is “ancestors and their relatives” and “might have lived.”

I have decided to address some of the issues where people are having the most difficulty.

The Survey – Type of Data Used

A number of folks, especially Americans that have had colonial ancestries, feel that the survey doesn’t go deep enough. The survey has been altered from the original version in the last few months to include the ancestral origins of a person’s grandparents; however, this data is not currently being used. It seems to me that eventually you will be able to see this information; however, it is not currently available and will probably not be what you initially see when your run your mouse over a segment. That will probably stay as the countries of birth of a person’s grandparents.

While a person’s grandparents’ nativity will not be of much value to others, it is a practical data collection solution. This is where it may be a stretch to understand the reasons for the structure - a person needs to remove the genealogist hat and put on a database administrator hat. While most don’t have a practical understanding of how databases that drive a tool like Ancestry Finder work, the developers appear to have designed the tool to be efficient and quick.

While family historians would desire more information, designing a tool to do this may be so cumbersome it would be slow and inefficient. When working with a relational database, you try and go for the lowest common denominator for the greatest success in returning data quickly. I think this is why 23andMe has chosen the current route. While not all can, most should be able to list the birthplace of their grandparents. It gets increasingly difficult to manage when you go back in time (where people may know even less about their forebears), as more unknowns will be present.

In addition, with each generation, the number of data points doubles. From a database perspective, four ancestors are easier to manage - both graphically in the Flash interface and informationally in the database columns. Adding great grandparents would be nice, but it would probably be a programmer's and database administrator's nightmare.

Some are displeased with the choice of nativity data and the use of the country name for the birth land. As far as the choice of current country name, you need a fixed data set from which to pull - otherwise, the information is chaotic and intended data will be missed. If this info was self reported, spelling errors on the part of the participant and the wide variety of responses would wreak havoc.

If historic borders were represented, which ones do you choose? I have a coworker whose ancestry is from the modern country of Slovakia. Does he use Austria-Hungary as is represented on two of his grandparents' birth records or does he use Czechoslovakia where his other grandparents were listed as being born. The villages were in the same region that is now Slovakia - but were under three different country names in the last 100 years.

What about countries that have historical distinctions – such as the UK with England, Scotland, Wales, and Northern Ireland? The problem is if everyone would want and need representation, the list would constantly change, people would be upset if their region was missed, and the list would be prohibitively long and unmanageable. Where do you stop? If this was done for the US, you would have to include every Native American tribe to be fair as they have tribal sovereignty.

By setting a standard to only current country names, 23andMe has made the database manageable – it is fixed and there are no competing variables. While that is little benefit for genealogists or people seeking info on their ancestry if they are adopted, it is the easiest and most effective way to approach this from a database administrator's perspective and from an Adobe Flash developer's design functionality.

As far as replacing country of birth with ethnicity, to be effective - a queried database column needs to have one and only one piece of data. My grandfather has 6 ethnic groups - which one do I use? If I were to use the major percentage, then the others would not be represented. If he were just 50% English and 50% French - which one would I use if he were equally French and English?

Since they are already collecting ethnic and religious backgrounds on the grandparents - I feel that they will be incorporating this data into the system. Ethnicities and nationalities probably will not be searchable by virtue of how the data is collected as there are too many variables in the names used and no fixed method of reporting the data. How 23andMe will use this information remains to be seen.

Fallacies Concerning the Data

Fallacy 1: The information is absolutely correct for them

While the assumption is that the data is correct, it is self-reported, undocumented data. It could be wrong. It could be what we believe to be true, what we’ve been told incorrectly by others, and it may just be an unproven assumption. For years my mother thought her maternal grandfather had been born in Germany and immigrated to the U.S. with his parents. She knew that the family returned to Germany to take possession of an inheritance and later returned to the States.

While the family did return to Germany, my great-grandfather was actually born in Johnstown, Pennsylvania in 1855. His mother had immigrated when she was a young girl and his father came when he was in his twenties. My great-grandfather, a brother, and two sisters were born in Johnstown. They returned to Germany when my great grandfather was about 8 years old. The family had two daughters and a son who were born in Germany.

Many folks don't know who their grandparents were let alone where they were born. The answer may have been speculation and may have been wrong.  

Fallacy 2: The information is absolutely correct for me

While the natural assumption is that a person's grandparents' nativity equates to their ethnicity, it is not a given. Just because a person was born in a particular country doesn’t mean that he or she was of that particular nationality. For years, my French ancestors lived in Germany. They retained their ethnic heritage and had married within a small enclave of families that had escaped religious persecution by traversing the Alps into Württemberg. Eventually, they married Germans and fully became German in all respects, but this took five generations.

Great migrations began with Europeans exploring the world – this brought different cultures in contact with one another. With the advent of colonialism and wars, European men came in contact with women from wide ethnic, racial, and religious backgrounds. Often children were produced from these relationships and the fathers who returned to their own country had no idea that their DNA would continue through the generations of individuals who live half-way across the world.

The extension of this thought is that having a cousin relationship with someone who has all four grandparents born in a single country equates to me having that nationality. A person's grandparent nativity proves nothing to me - only that I have a cousin who appears to have four generations living in a certain country.

Fallacy 3: The information is absolutely incorrect for me

Like sticker shock, I think people have been having several reactions to seeing a number of countries represented on their AF reports. The immediate reaction is "how can this be?" This is especially true when more exotic locations are represented that on the surface conflict with what we already know about our families. While nearly every European country is represented on my brothers’ and my lists, the more exotic locations include Venezuela, Uruguay, Trinidad and Tobago, Cuba, India, Saudi Arabia, St. Kitts and Nevis, and Algeria. Most of these represent locations where only one grandparent in four of our genetic cousins was listed as being born.

In the cases where there is a match with a genetic cousin who has four grandparents from the same country, the implication is that one of our ancestors must have also come from that country as well. This assumption, while it is certainly plausible, doesn’t necessarily have to be correct.

This seems to be an area that is a stumbling block to most everyone about AF and it may be an inherent flaw in how AF was first presented. Without enough information, one cannot successfully determine his or her ancestry based on the nativity location of someone else's grandparents. There are too many unknown variables to adequately do so.

Since the lowest reported share is 5cM, these can be fairly distant relationships at the 5th cousin level or higher. A 5cM share is roughly a genome share of .06%. If percentages were equal across all generations (which they are not), these numbers are in line with 5th cousins with the most recent common ancestor being fourth great grandparents. A list of the average shared DNA and the represented cousin relationships are listed below.

  Average Shared 
 DNA Percentage 
  Cousins   Half Cousins   Cousins Removed  



First Cousins
First Cousins,
Once Removed
 3.13% Second
Half First Cousins,
Once Removed
First Cousins,
Twice Removed
Half Second
Second Cousins,
Once Removed
 0.78% Third
 Half Second
Once Removed
Second Cousins,
Twice Removed
Half Third Cousins Third Cousins,
Once Removed
 0.20% Fourth
Half Third
Once Removed
Third Cousins,
Twice Removed
Half Fourth
Fourth Cousins,
Once Removed
 0.05% Fifth
Half Fourth
Once Removed
Fourth Cousins,
Twice Removed

While a share of 5cM is consistent with a fifth cousin, I am finding that some larger segments of my own are going back further in time to documented 8th and 9th cousins. With numerous variables that are not constant, it becomes difficult to accurately predict relationships. This is why Relative Finder often lists a relationship range. Not everyone will carry the average amount of shared DNA and not every cousin will be represented in a person’s genome.

Be that as it may, we will analyze the 4th great-grandparent/5th cousin scenario – which is the largest predicted representations for most people’s Relative Finders results. If in your ancestry you have no pedigree collapse and all of your 4th great-grandparents are unique, you would have a total of 64 individuals at that spot. The birth year for these individuals, depending on your year of birth, may have occurred between 1750 and 1800. My paternal fourth great-grandfather was born in 1733 – which is earlier than most.

While it is possible for someone to have traced all 64 fourth great-grandparents, if you are like me there are numerous unknowns on your family tree. I believe I have a pretty good knowledge of some of my fourth great-grandparents as I can identify 40 by name. Of the other 24, I know one’s first name and the rest are just a mystery. Since I’ve been tracing my family since the 1970s, I would think that I have a better than average grasp of my ancestry – but I don’t know all my ancestors at this level.

With the lack of knowledge we have of many of these 64 people, it is impossible to say what a person's ethnicity was. Migrations were occurring in 1800 and though we may reasonably guess that all of our ancestors from a given line are English, German, Polish, or whatever, unless we know who that person is for sure, one cannot say. This was also the time of great colonialism and England, Spain, Portugal, France, and Holland had previously instituted colonies on every inhabitable continent. They would later be joined by the Belgians and Germans who would set up their own colonies.

Since we may not possibly know all of our 4th great grand-parents, we cannot know what all of their descendants did either. Taking a conservative approach of every person in a given lineage from a set of 4th great grandparents who in every instance the couple and their descendants produced 2 children - this adds to a total of 8,128 descendants across all your ancestral lines. Although large, this is a very conservative number and could be exponentially larger. All it takes is one person at least four generations back to relocate to another nation to have a child born in that nation and that child to be someone's grandparent.

In addition - there are always non paternal events (NPEs) that occur - a child may be born to ancestor's descendant who was stationed in a particular part of the world during a conflict or as part of a colonial peace keeping force. So you may share DNA with someone who has no idea of his/her English ancestry because his/her 2nd great grandfather was a soldier stationed in Egypt, Manila, Saigon, or Seoul. Your relative may not have been aware that he had a child from a one night stand. There was even one individual that adamantly stated there were no illegitimate children in his family.  My question is, "How do you know for sure?" 

Knowing that the percentage of DNA we share with a person could be representative of a deeper ancestry at even the 10th Cousin level, this only adds to the list of possibilities. If a person is a 10th cousin, our common ancestor is a 9th great-grandparent. Again using my paternal line as example, we have estimated that my 9th great-grandfather was probably born in the 1560s. Although his birth was not documented, he is listed in his father’s 1567 will and he was producing children from 1590 until his death in 1602.

At the 10th cousin level, we will have the possibility of 2,048 9th great-grandparents; however, that number may not be as high due to pedigree collapse and shared ancestry. Using the algorithm that these ancestors and their descendants each produced two children, a conservative estimate of all the descendants from our 9th great-grandparents would number in the range of 8 million persons. With these numbers, it would be impossible for us to know where every descendant of a common ancestor lived, who they married, and what flag their descendants fly today.

Don't get hung up on where someone’s grandparents lived - accept that you have cousins that have a rich heritage that you don't share and be comfortable knowing that you share something with them - it just is probably not represented by the flag next to their grandparent's nativity data.

Implications of My Results

When removing all of the defaults, I had a total of 29 countries represented, one brother had 34, and my other brother had 31. When I controlled for all four grandparents coming from the same country, each one of us had 13 countries each with the total number of countries represented across the three of us as being 16.

Most of our ancestral nations are represented at the four grandparent level: US, UK, Germany, Canada, France, and Ireland. Switzerland and the Bailiwick of Jersey are missing from this list. Australia is among these and I have known cousins on both my mother's and father’s side descended from folks who made their home “Down Under.” Nine countries are represented for which I have no known connection.

Using one of my brothers’ results, I can draw some assumptions about how this could have happened.

In the above graphic, five countries from which we have no known ancestral ties are listed. I'll address these in a clockwise order starting with Poland.

Poland: While I cannot verify I had any Polish ancestors, my brother is sharing .08% of his genome with someone who does. While working largely in the realm of the unknown, a large portion of my ancestry is German.

When looking at German historical maps, this country created from the confederation of various Germanic states in 1871 had borders that extended far beyond the country’s current limitations. Following both World Wars, large sections of former German territory were ceded to Poland – and some of my German relatives could have lived there – and hence are now living in Polish jurisdiction. While I have no proof, it makes sense.

The Netherlands: This is one of the more interesting points on AF that I have discovered. All three of us are sharing with those who are from the Low Countries. While we are currently not sharing at the 5cM level on 23andMe, I found a person on Leon Kull’s HIR Search of Flemish origin who has a surname that is remarkably similar to my mother’s maiden name. While numerous people have searched her particular lineage, none have been able to go beyond my 3rd great-grandfather who was born circa 1804 in Pennsylvania.

NOTE: The above hypothesis was not confirmed as I am actually related to the person's mother and not through his father's (surname) lineage.

Theories on the surname’s origin are varied. Huguenots have taken similar names to Lancashire, England. Being that the name is more Germanic than French, some have supposed that it is Alsatian in origin. This is would make sense, as Alsace for most of its history was under French control and is largely German ethnically. The only period when Alsace was a German province was during the German Imperial period of 1871-1918 – one of the spoils of the Franco-Prussian War.

A theory I have worked on is that we are descended from a gentleman with a similar surname who lived in Berks, Northampton, and Carbon counties of Pennsylvania and who was from Prussia. While others have been successful in tracing their lineage to him, I only have circumstantial evidence that a person of a similar name to my 3rd great grandfather lived there in the 1820s.

The finding of the genetic connection via HIR Search and the uncanny similarities between the two surnames provides a new direction with which I may focus. It may be a red herring, but theories are often part of the genealogical unknown.

Sweden & Finland: I’ll deal with these two Scandinavian countries together (Norway is also on the list too). My only theory here is that in my paternal lineage there were many relatives who were merchant seaman from Scarborough, England. One line of ancestral cousins actually settled in Norway – so ethnic Norwegians being cousins is a strong possibility.

As for Sweden and Finland, perhaps a relative shared some of our DNA when in port. As the master of a merchant vessel, my 3rd great-grandfather had sailed throughout this region and these countries are documented in his Royal Navy records as places where he had sailing experience.

Italy: My great-grandmother’s ancestry stretches back to Dauphiné, France. This Piedmont region straddles modern day France and Italy. This line has been documented as living for a brief period in Italy; however, no known descendants of this lineage were Italian. Hopefully I can find some known connection and prove this possibility.

The Important Thing

Ancestry Finder is a tool. The important thing is to use it with an open mind. Embrace the diversity of those who share genomes with you. Most of all, have fun with it. I know some folks have had difficulty accepting the relationships they are seeing; however, with the limitless number of variables that are present, the overall lack of knowledge we all have of portions of our lineage, and the number of years that have transpired since a common ancestor -- anything is possible.


  1. Excellent overview of the new 23andMe AF tool, thank you for sharing!

    Michael Gregory

  2. Thanks Mich@ael. I appreciate it.

  3. Great explanation, thank you so much.
    My matches are UK and Ireland - I longed for exoticism but alas no!

  4. Great post, I enjoyed reading it. btw, I noticed we shared 406 SNP's on Chromosome #9.

    Thanks Leon Kull!

  5. Thanks Tabitha.

    Salabencher - I'll check out the match on 9 - cousin and see if I have anyone else there as well.


  6. very informative, i'd been getting alot of polish results on various dna tests including the new AF.
    After looking at the surnames of all my 'German' ancestors and even the English line,surname [Greatbatch - family folklore had said it was formerly grabich from the netherlands] and an actual dutch ancestor. I Found all these surnames at the earliest point [1600's] [using] were all from Prussia/Pruessen.Which was both Polish and German.

  7. Katie: The Polish/Prussian factor is probably skewing the results of many folk with a number of ethnic Germans having been born in what is now Poland.