Saturday, July 10, 2010

Ancestry Finder: It's a Small World After All

Well, this week 23andMe opened up the floodgates and provided its customer base complete access to its latest Ancestry Labs feature: Ancestry Finder (AF). It is amazing how people are reacting to this feature – some with joy and elation and some – well, their reactions were a little strange. In fact, some of these run the gamut of Elizabeth Kübler-Ross’ “Five Stages of Grief”:

Denial: “My ancestors could not have come from the country of Outer Mitochondria. It is simply not possible. I know my ancestry and this is absolutely wrong. I don’t care what 23andMe says – it is wrong.”

Anger: Everyone who is an ethnic Chromosomian needs to be upset that their ancestor’s place of birth is used and not our Chromosomian heritage. This is unfair – just because they were born in Autosomia doesn’t make them Autosomians. They were Chromosomians and proud of it.”

Bargaining: “There are only a few matches that are from Mendeland. Please 23andMe, fix it – please fix it soon. If you do, I’ll be a loyal customer.”

Depression: “I am so upset because my ancestors appear to be from Centimorgana. This is going against what I’ve known from my grandparents and my research. I don’t even know who I am these days.”

Acceptance: “They say the DNA doesn’t lie. I’ll go ahead and acknowledge my new heritages. I’m not sure I understand it, but I guess that’s me.”

Well, some of the above responses are paraphrased from real customer comments and the others are fabricated; however, people are vocal about some of the ancestral implications. Most of this is based on a lack of understanding of the tool and how it works, a myopic view of one’s ancestral possibilities, and a over confidence of their own knowledge of their heritage.

I will have to give kudos to 23andMe for developing this tool as it is interesting and fun to use. Since my last report during the beta test in which I participated, they have adjusted the default settings from 10cM to 7cM, which is the same as their Relative Finder feature.

They also changed their wording, from “Did you ever wonder if you were a quarter Irish?” explanation to the more descriptive “Let the 23andMe Community help you discover what countries your ancestors and their relatives might have lived in. This lab is fueled by your responses to the "Where Are You From?" ancestry survey.” The key to this description is “ancestors and their relatives” and “might have lived.”

I have decided to address some of the issues where people are having the most difficulty.

The Survey – Type of Data Used

A number of folks, especially Americans that have had colonial ancestries, feel that the survey doesn’t go deep enough. The survey has been altered from the original version in the last few months to include the ancestral origins of a person’s grandparents; however, this data is not currently being used. It seems to me that eventually you will be able to see this information; however, it is not currently available and will probably not be what you initially see when your run your mouse over a segment. That will probably stay as the countries of birth of a person’s grandparents.

While a person’s grandparents’ nativity will not be of much value to others, it is a practical data collection solution. This is where it may be a stretch to understand the reasons for the structure - a person needs to remove the genealogist hat and put on a database administrator hat. While most don’t have a practical understanding of how databases that drive a tool like Ancestry Finder work, the developers appear to have designed the tool to be efficient and quick.

While family historians would desire more information, designing a tool to do this may be so cumbersome it would be slow and inefficient. When working with a relational database, you try and go for the lowest common denominator for the greatest success in returning data quickly. I think this is why 23andMe has chosen the current route. While not all can, most should be able to list the birthplace of their grandparents. It gets increasingly difficult to manage when you go back in time (where people may know even less about their forebears), as more unknowns will be present.

In addition, with each generation, the number of data points doubles. From a database perspective, four ancestors are easier to manage - both graphically in the Flash interface and informationally in the database columns. Adding great grandparents would be nice, but it would probably be a programmer's and database administrator's nightmare.

Some are displeased with the choice of nativity data and the use of the country name for the birth land. As far as the choice of current country name, you need a fixed data set from which to pull - otherwise, the information is chaotic and intended data will be missed. If this info was self reported, spelling errors on the part of the participant and the wide variety of responses would wreak havoc.

If historic borders were represented, which ones do you choose? I have a coworker whose ancestry is from the modern country of Slovakia. Does he use Austria-Hungary as is represented on two of his grandparents' birth records or does he use Czechoslovakia where his other grandparents were listed as being born. The villages were in the same region that is now Slovakia - but were under three different country names in the last 100 years.

What about countries that have historical distinctions – such as the UK with England, Scotland, Wales, and Northern Ireland? The problem is if everyone would want and need representation, the list would constantly change, people would be upset if their region was missed, and the list would be prohibitively long and unmanageable. Where do you stop? If this was done for the US, you would have to include every Native American tribe to be fair as they have tribal sovereignty.

By setting a standard to only current country names, 23andMe has made the database manageable – it is fixed and there are no competing variables. While that is little benefit for genealogists or people seeking info on their ancestry if they are adopted, it is the easiest and most effective way to approach this from a database administrator's perspective and from an Adobe Flash developer's design functionality.

As far as replacing country of birth with ethnicity, to be effective - a queried database column needs to have one and only one piece of data. My grandfather has 6 ethnic groups - which one do I use? If I were to use the major percentage, then the others would not be represented. If he were just 50% English and 50% French - which one would I use if he were equally French and English?

Since they are already collecting ethnic and religious backgrounds on the grandparents - I feel that they will be incorporating this data into the system. Ethnicities and nationalities probably will not be searchable by virtue of how the data is collected as there are too many variables in the names used and no fixed method of reporting the data. How 23andMe will use this information remains to be seen.

Fallacies Concerning the Data

Fallacy 1: The information is absolutely correct for them

While the assumption is that the data is correct, it is self-reported, undocumented data. It could be wrong. It could be what we believe to be true, what we’ve been told incorrectly by others, and it may just be an unproven assumption. For years my mother thought her maternal grandfather had been born in Germany and immigrated to the U.S. with his parents. She knew that the family returned to Germany to take possession of an inheritance and later returned to the States.

While the family did return to Germany, my great-grandfather was actually born in Johnstown, Pennsylvania in 1855. His mother had immigrated when she was a young girl and his father came when he was in his twenties. My great-grandfather, a brother, and two sisters were born in Johnstown. They returned to Germany when my great grandfather was about 8 years old. The family had two daughters and a son who were born in Germany.

Many folks don't know who their grandparents were let alone where they were born. The answer may have been speculation and may have been wrong.  

Fallacy 2: The information is absolutely correct for me

While the natural assumption is that a person's grandparents' nativity equates to their ethnicity, it is not a given. Just because a person was born in a particular country doesn’t mean that he or she was of that particular nationality. For years, my French ancestors lived in Germany. They retained their ethnic heritage and had married within a small enclave of families that had escaped religious persecution by traversing the Alps into Württemberg. Eventually, they married Germans and fully became German in all respects, but this took five generations.

Great migrations began with Europeans exploring the world – this brought different cultures in contact with one another. With the advent of colonialism and wars, European men came in contact with women from wide ethnic, racial, and religious backgrounds. Often children were produced from these relationships and the fathers who returned to their own country had no idea that their DNA would continue through the generations of individuals who live half-way across the world.

The extension of this thought is that having a cousin relationship with someone who has all four grandparents born in a single country equates to me having that nationality. A person's grandparent nativity proves nothing to me - only that I have a cousin who appears to have four generations living in a certain country.

Fallacy 3: The information is absolutely incorrect for me

Like sticker shock, I think people have been having several reactions to seeing a number of countries represented on their AF reports. The immediate reaction is "how can this be?" This is especially true when more exotic locations are represented that on the surface conflict with what we already know about our families. While nearly every European country is represented on my brothers’ and my lists, the more exotic locations include Venezuela, Uruguay, Trinidad and Tobago, Cuba, India, Saudi Arabia, St. Kitts and Nevis, and Algeria. Most of these represent locations where only one grandparent in four of our genetic cousins was listed as being born.

In the cases where there is a match with a genetic cousin who has four grandparents from the same country, the implication is that one of our ancestors must have also come from that country as well. This assumption, while it is certainly plausible, doesn’t necessarily have to be correct.

This seems to be an area that is a stumbling block to most everyone about AF and it may be an inherent flaw in how AF was first presented. Without enough information, one cannot successfully determine his or her ancestry based on the nativity location of someone else's grandparents. There are too many unknown variables to adequately do so.

Since the lowest reported share is 5cM, these can be fairly distant relationships at the 5th cousin level or higher. A 5cM share is roughly a genome share of .06%. If percentages were equal across all generations (which they are not), these numbers are in line with 5th cousins with the most recent common ancestor being fourth great grandparents. A list of the average shared DNA and the represented cousin relationships are listed below.

  Average Shared 
 DNA Percentage 
  Cousins   Half Cousins   Cousins Removed  

12.50%

First
Cousins


 6.25%
Half
First Cousins
First Cousins,
Once Removed
 3.13% Second
Cousins
Half First Cousins,
Once Removed
First Cousins,
Twice Removed
 1.56%
Half Second
Cousins
Second Cousins,
Once Removed
 0.78% Third
Cousins
 Half Second
Cousins,
Once Removed
Second Cousins,
Twice Removed
 0.39%
Half Third Cousins Third Cousins,
Once Removed
 0.20% Fourth
Cousins
Half Third
Cousins,
Once Removed
Third Cousins,
Twice Removed
 0.10%
Half Fourth
Cousins
Fourth Cousins,
Once Removed
 0.05% Fifth
Cousins
Half Fourth
Cousins,
Once Removed
Fourth Cousins,
Twice Removed


While a share of 5cM is consistent with a fifth cousin, I am finding that some larger segments of my own are going back further in time to documented 8th and 9th cousins. With numerous variables that are not constant, it becomes difficult to accurately predict relationships. This is why Relative Finder often lists a relationship range. Not everyone will carry the average amount of shared DNA and not every cousin will be represented in a person’s genome.

Be that as it may, we will analyze the 4th great-grandparent/5th cousin scenario – which is the largest predicted representations for most people’s Relative Finders results. If in your ancestry you have no pedigree collapse and all of your 4th great-grandparents are unique, you would have a total of 64 individuals at that spot. The birth year for these individuals, depending on your year of birth, may have occurred between 1750 and 1800. My paternal fourth great-grandfather was born in 1733 – which is earlier than most.

While it is possible for someone to have traced all 64 fourth great-grandparents, if you are like me there are numerous unknowns on your family tree. I believe I have a pretty good knowledge of some of my fourth great-grandparents as I can identify 40 by name. Of the other 24, I know one’s first name and the rest are just a mystery. Since I’ve been tracing my family since the 1970s, I would think that I have a better than average grasp of my ancestry – but I don’t know all my ancestors at this level.

With the lack of knowledge we have of many of these 64 people, it is impossible to say what a person's ethnicity was. Migrations were occurring in 1800 and though we may reasonably guess that all of our ancestors from a given line are English, German, Polish, or whatever, unless we know who that person is for sure, one cannot say. This was also the time of great colonialism and England, Spain, Portugal, France, and Holland had previously instituted colonies on every inhabitable continent. They would later be joined by the Belgians and Germans who would set up their own colonies.

Since we may not possibly know all of our 4th great grand-parents, we cannot know what all of their descendants did either. Taking a conservative approach of every person in a given lineage from a set of 4th great grandparents who in every instance the couple and their descendants produced 2 children - this adds to a total of 8,128 descendants across all your ancestral lines. Although large, this is a very conservative number and could be exponentially larger. All it takes is one person at least four generations back to relocate to another nation to have a child born in that nation and that child to be someone's grandparent.

In addition - there are always non paternal events (NPEs) that occur - a child may be born to ancestor's descendant who was stationed in a particular part of the world during a conflict or as part of a colonial peace keeping force. So you may share DNA with someone who has no idea of his/her English ancestry because his/her 2nd great grandfather was a soldier stationed in Egypt, Manila, Saigon, or Seoul. Your relative may not have been aware that he had a child from a one night stand. There was even one individual that adamantly stated there were no illegitimate children in his family.  My question is, "How do you know for sure?" 

Knowing that the percentage of DNA we share with a person could be representative of a deeper ancestry at even the 10th Cousin level, this only adds to the list of possibilities. If a person is a 10th cousin, our common ancestor is a 9th great-grandparent. Again using my paternal line as example, we have estimated that my 9th great-grandfather was probably born in the 1560s. Although his birth was not documented, he is listed in his father’s 1567 will and he was producing children from 1590 until his death in 1602.

At the 10th cousin level, we will have the possibility of 2,048 9th great-grandparents; however, that number may not be as high due to pedigree collapse and shared ancestry. Using the algorithm that these ancestors and their descendants each produced two children, a conservative estimate of all the descendants from our 9th great-grandparents would number in the range of 8 million persons. With these numbers, it would be impossible for us to know where every descendant of a common ancestor lived, who they married, and what flag their descendants fly today.

Don't get hung up on where someone’s grandparents lived - accept that you have cousins that have a rich heritage that you don't share and be comfortable knowing that you share something with them - it just is probably not represented by the flag next to their grandparent's nativity data.

Implications of My Results

When removing all of the defaults, I had a total of 29 countries represented, one brother had 34, and my other brother had 31. When I controlled for all four grandparents coming from the same country, each one of us had 13 countries each with the total number of countries represented across the three of us as being 16.

Most of our ancestral nations are represented at the four grandparent level: US, UK, Germany, Canada, France, and Ireland. Switzerland and the Bailiwick of Jersey are missing from this list. Australia is among these and I have known cousins on both my mother's and father’s side descended from folks who made their home “Down Under.” Nine countries are represented for which I have no known connection.

Using one of my brothers’ results, I can draw some assumptions about how this could have happened.



In the above graphic, five countries from which we have no known ancestral ties are listed. I'll address these in a clockwise order starting with Poland.

Poland: While I cannot verify I had any Polish ancestors, my brother is sharing .08% of his genome with someone who does. While working largely in the realm of the unknown, a large portion of my ancestry is German.

When looking at German historical maps, this country created from the confederation of various Germanic states in 1871 had borders that extended far beyond the country’s current limitations. Following both World Wars, large sections of former German territory were ceded to Poland – and some of my German relatives could have lived there – and hence are now living in Polish jurisdiction. While I have no proof, it makes sense.


The Netherlands: This is one of the more interesting points on AF that I have discovered. All three of us are sharing with those who are from the Low Countries. While we are currently not sharing at the 5cM level on 23andMe, I found a person on Leon Kull’s HIR Search of Flemish origin who has a surname that is remarkably similar to my mother’s maiden name. While numerous people have searched her particular lineage, none have been able to go beyond my 3rd great-grandfather who was born circa 1804 in Pennsylvania.

NOTE: The above hypothesis was not confirmed as I am actually related to the person's mother and not through his father's (surname) lineage.

Theories on the surname’s origin are varied. Huguenots have taken similar names to Lancashire, England. Being that the name is more Germanic than French, some have supposed that it is Alsatian in origin. This is would make sense, as Alsace for most of its history was under French control and is largely German ethnically. The only period when Alsace was a German province was during the German Imperial period of 1871-1918 – one of the spoils of the Franco-Prussian War.

A theory I have worked on is that we are descended from a gentleman with a similar surname who lived in Berks, Northampton, and Carbon counties of Pennsylvania and who was from Prussia. While others have been successful in tracing their lineage to him, I only have circumstantial evidence that a person of a similar name to my 3rd great grandfather lived there in the 1820s.

The finding of the genetic connection via HIR Search and the uncanny similarities between the two surnames provides a new direction with which I may focus. It may be a red herring, but theories are often part of the genealogical unknown.

Sweden & Finland: I’ll deal with these two Scandinavian countries together (Norway is also on the list too). My only theory here is that in my paternal lineage there were many relatives who were merchant seaman from Scarborough, England. One line of ancestral cousins actually settled in Norway – so ethnic Norwegians being cousins is a strong possibility.

As for Sweden and Finland, perhaps a relative shared some of our DNA when in port. As the master of a merchant vessel, my 3rd great-grandfather had sailed throughout this region and these countries are documented in his Royal Navy records as places where he had sailing experience.

Italy: My great-grandmother’s ancestry stretches back to Dauphiné, France. This Piedmont region straddles modern day France and Italy. This line has been documented as living for a brief period in Italy; however, no known descendants of this lineage were Italian. Hopefully I can find some known connection and prove this possibility.

The Important Thing

Ancestry Finder is a tool. The important thing is to use it with an open mind. Embrace the diversity of those who share genomes with you. Most of all, have fun with it. I know some folks have had difficulty accepting the relationships they are seeing; however, with the limitless number of variables that are present, the overall lack of knowledge we all have of portions of our lineage, and the number of years that have transpired since a common ancestor -- anything is possible.

Tuesday, July 6, 2010

New Tool Coming to 23andMe: Ancestry Finder

This weekend, I had the opportunity with several others to beta test a new feature from 23andMe. It is called Ancestry Finder. Although it is billed to help you find ancestral countries of origin, it may or may not be able to deliver on that promise.

For most individuals this will not be possible from the data that is presented; however, I find it very helpful especially with the lack of participation of others on 23andMe when asked to share genomes. This feature may help fill in holes there.

The data is self reported by 23andMe participants who have finished the survey, "Where are you from?"  This survey asks about the participants' grandparents and the data is used to determine the ancestral countries of the person whose match will show. Persons who do not finish the survey will not have their shares seen by others. 

Here are some of the highlights and some screen shots of the feature. Click on any of the images for larger versions that will be clearer.

The following image shows my shares with the default settings of 10cM of shared DNA segments, with the matches only with four grandparents from one country, and all colonial matches (USA, Canada, Australia, New Zealand, and South Africa) turned off.  I only have two matches that fit all three of the default criteria.  In this shot, I have placed my mouse over the segment on chromosome 10.  I am sharing 11.7cM (centimorgans) with a person whose four grandparents were born in Germany.



In this second image, still with the defaults set, I have placed my mouse on the share on chromosome five.  This individual's four grand parents were born in Poland and I am sharing 15.3cM - which puts me at sharing about .20% of my DNA with this person and places us in a possible fourth cousin relationship.  This is interesting as I have no known Polish ancestry; however, it may be an ethnic German who remained in what was part of East Prussia that was ceded to Poland. 


Since I am managing my profile, as well as the profiles of my two brothers, here are our results.  The settings are wide open with shares as low as 5cM, the grandparents can be from different countries, and colonial countries are represented.

Brother Number One

Brother Number Two

My Genome Shares

The above three graphics indicate what DNA my brothers and I share with the same and different people.  Of the people we share DNA with, about 42% of these individuals are unique to one of us and do not share genomes with the other two.  To illustrate this, here is an enlarged version of the shares on chromosome 2. Notice the similarities and the differences.

This is a manipulated image and you will not see this in Ancestry Finder.

Here are my list of countries that match the above graphic. The countries represent places where the grandparents of the 23andMe participants (who match my DNA) were born.  This is not to be construed as my ethnic background; however, it could indicate some nationalities I may have of which I was previously unaware.




Note: there are 26 countries represented.  Of these, I only have claimed 8 as ancestral lands or countries where known cousins have lived.  I will be interested in seeing how these other countries are connected to my family.

One of the features in Ancestry Finder is the ability to place your mouse over a country's name and see only the matches that match that particular location. Since I have no Italian ancestry to my knowledge, I picked Italy.


There are 9 segments representing Italy.  Four subjects had one grandparent born in Italy, while the remaining five had two grandparents who were born in that country. My ancestors had crossed into Italy and lived close to the Franco-Italian border during the 1600s.  It is possible that some of these individuals are descended from that particular ancestral line.

As you can see, there is much to explore in Ancestry Finder.  I am excited about the forthcoming tool and the opportunity to review it. In that spirit, here is my analysis of the ups and downs of Ancestry Finder. It won't be for everyone, but I see much value in using it. Here are the positives vs. the negatives.

Positives:

  1. It gives you a graphical representation of where matches are occurring on chromosomes.
  2. It shows matches of people who have filled out the “Where are you from?” survey – who may not be showing on Relative Finder (for a variety of reasons).
  3. By comparing segment lengths, it may (or may not) give some clues to the ancestry of a non-responsive person on Relative Finder – such as that elusive 3rd cousin with a .90% match who will not respond to your requests to share and you have used up your three Relative Finder contacts. You may be able to at least assign the match to a particular line (i.e., if your maternal grandmother was French and your other grandparents were English and the match has all grandparents as being French born – there is an indication that the relationship probably (not absolutely) comes from your maternal grandmother’s line).
  4. The threshold can be dropped to 5cM. The threshold on Relative Finder is 7cM. 5cM shares only show when contacted directly outside of Relative Finder.
  5. It can be adjusted so that only individuals with 1, 2, 3, or 4 grandparents from the same country can be returned (1 gives you everyone).
  6. The settings can be adjusted so the minimum threshold can be as set between 5cM and 15cM.
  7. The country of birth is identified for each of the person’s specific grandparents (when known).
  8. A filter can be applied to eliminate seeing those with grandparents born in colonial countries (US, Canada, Australia, NZ, & South Africa). This is actually the default setting.
  9. Segment sizes are exact and not rounded as is Family Inheritance Advanced.
  10. It allows you to see various profiles under an account without physically switching to that profile.
  11. There is anonymity for those not wanting any contact from other researchers.
  12. The survey allows for ethnicity data to be entered (not currently used, but may later be used).
  13. Haplogroup information is missing from Ancestry Finder – this is good thing as too many people are confused by haplogroups on Relative Finder anyway.
  14. A preselected list of countries reduces human error and forces survey participants to pick the current country name.
  15. It is very easy to use.
  16. A mouseover the list of country names shows only those returns where a grandparent’s nativity is shown.
  17. This is a nice extra that I was not expecting to get when I signed up for 23andMe. Thanks guys/gals.

Negatives:

  1. Grandparent nativity may not shed any light on a relationship as it is too general of a measure.
  2. Grandparent nativity does not speak to specific nationalities or ethnicities.
  3. Self reported data on grandparent nativity may be incorrect due to a mistake or lack of correct knowledge of the survey respondent.
  4. Produces a false sense of nationality when a person assumes that an ethnicity is present in his or her own line when a match has four grandparents born in a particular country. Just being born in a country does not constitute a particular ethnicity only nativity.
  5. Not everyone on Relative Finder is represented – only those who finished the survey.
  6. The survey cannot be changed and mistakes cannot be corrected.
  7. The Flash based graphics are hard to see where there are overlaps of individuals matching.
  8. Currently, the ethnicity information in the survey is not being returned.
  9. Haplogroup information is missing from the returns – making identification with non-sharing Relative Finder cousins more difficult.
  10. Some colors representing countries are similar to others (or even the same) and may or may not cause some confusion. I doubt if this could be corrected as shades of colors often look the same on certain browsers. This is really not that big of a deal.
  11. A preselected list of countries does not take into account historic borders or former country names (this is good for efficiency, but perhaps bad for genealogists). I wouldn’t suggest doing it any differently.
  12. Mouseovers may be difficult (not impossible) to perform on contiguous segments.
  13. Genome segment starts and ends are not provided.
  14. The minimum threshold can only be set to a high of 15cM – may not be enough for those having 1000 matches.
  15. The size of the Flash interface cannot be increased or decreased by using CTRL+ or CTRL- therefore, it may be difficult to see some of the shares on a busy chromosome in some higher screen resolutions.
  16. The listing arrangement of grandparents is backwards from how genealogists spatially arrange trees and relationships. Currently, Ancestry Finder lists the grandparents in the following order: Mom's mother, Mom's father, Dad's mother, and Dad's father. Typically, the male lineage is on the left as Dad's father, Dad's mother, Mom's father, and Mom's mother.
  17. It won't meet everyone's specific needs.

It is an exciting time at 23andMe and their discussion area was burning up this weekend regarding this forthcoming feature. I’d like to thank the powers that be for considering me to be a beta tester on this new tool that can be added to a genetic genealogist’s arsenal of weapons.