Sunday, May 26, 2013

Non Genetic Relatives In A DNA Database

Finding a relative that doesn’t match my DNA in a DNA database? That’s preposterous, isn’t it? One of the reasons many of us have chosen to test with 23andMe and other companies is to find unknown relatives that match our DNA. True, but we also have relatives with whom we have no matching DNA, and some of these should appear in 23andMe's database. If you have tested other relatives, you have an opportunity of finding some of these non-matching individuals. Indulge me while I explain.

While I was allowing my mind to wander concerning autosomal testing the other day, I came to the realization that I have relatives with whom I have no significant genetic connection, and that these individuals may be found in 23andMe’s database. I’ve really known this for a while, as I have tested a number of close family members and they have matches that I don’t have. No doubt I am related to some of these individuals. Because of recombination, the amount of DNA we receive from a given ancestor is diminished in each generation. Therefore, not everyone who is legitimately related to us shares DNA with us.

This is best understood with close relatives. Since I’ve tested my mother and two brothers, approximately half of their matches do not share any segments of DNA at 5cM or higher with me and therefore do not show in my “DNA Relatives” (formerly “Relative Finder”) on 23andMe. In other words, for every 50 matches I share with my mother, theoretically she has 50 matches that I do not share. The same goes for each brother.

In addition, I’ve known since I began testing fourth cousins, that some would match me and some would not. I am related to these individuals, but we simply do not share any DNA. For example, I have 6 fourth cousins and 1 fourth cousin, once removed who have tested. I share no significant amounts of DNA with the fourth cousin, once removed and none with 4 of the 6 fourth cousins. Just because I do not match some of these cousins, it doesn’t mean we are not related. There is only a 45% chance that you will match a known fourth cousin.

I then realized that, although I was not matching these relatives, there were others they matched who were possibly related to me. So the first step in this process was to determine the percentage of possible relatives a particular individual might share with me and then calculate the mean number of that person’s matches with which I might share DNA.

While the percentage of total ancestors I would share with a person would be absolute, the percentage of a person’s matches I would share with an individual would be an approximate. Since there is randomness in recombination, I might share more or less than an average amount with a person.

For example, a sibling will share an average of 50% of his or her DNA with other siblings. I share 49.6% of my genome with my oldest brother, which is fairly close to the average of 50%; however, my other brother and I share only 41% of our DNA – which is considerably less than average. Therefore, I could expect that I have lesser number of DNA matches with this brother than I do with our oldest brother.

The second factor is due to nature of 23andMe’s database. There will be a lesser number of potential matches from ancestral populations that are not well represented. For example, I have about 30% German ancestry from my mother’s family; however, I have a smaller number of German matches than I do through my father’s New England Colonial family. Therefore, predicting the number of actual non-matching relatives we have in the database is entirely dependent upon 23andMe’s customer base.

Therefore, no amount of calculation will be correct in regards to the two issues of the random nature of recombination and database limitations.

The following chart explains the percentage of a relative’s ancestors we have in common and the average amount of DNA we should expect to share.

Relationship Shared Ancestry Average Shared DNA
Identical Twin100.00%100.00%
Parent100.00%50.00%
Sibling100.00%50.00%
Grandparent100.00%25.00%
Aunt/Uncle100.00%25.00%
Double Cousin100.00%25.00%
Great Grandparent100.00%12.50%
Great Aunt/Uncle100.00%12.50%
Half Sibling50.00%25.00%
Half Aunt/Uncle50.00%12.50%
1st Cousin50.00%12.50%
Half Cousin25.00%6.25%
1st Cousin, Once Removed25.00%6.25%
2nd Cousin25.00%3.13%
Half Cousin, Once Removed12.50%3.13%
2nd Cousin, Once Removed12.50%1.56%
3rd Cousin12.50%0.78%
4th Cousin6.25%0.20%

In the above table, the numbers in the second column represent the absolute percentage of that person’s ancestors we would actually share. For example, a first cousin, once removed seems high at 25%; however, if we consider that we share 100% of the ancestors of our parents’ siblings, and we do, then an aunt/uncle’s child (a first cousin) is half that amount at 50%. Therefore, our first cousin’s child shares 25% of his or her ancestors with us. We would be related to 25% of our first cousin, once removed’s ancestors and approximately 25% of that person’s potential relatives.

The exact number of relatives that we share will vary due to the number of actual relatives that person has. For example if we have 20 first cousins on our father’s side and only two on our mother’s side, we will have more paternal relatives than maternal, so the numbers only work in theory. 

The third column represents the average amount of DNA we share with our relatives at each level. Using our first cousin, once removed again as an example, we only share 6.25% of our DNA with this relative – therefore, we should match about 6.25% of that cousin’s 23andMe matches. Again if all things were equal (and they are not), we might expect to be related to 18.75% of that person’s matches, but not share any DNA with these individuals. Again, these percentages are not absolutes, but approximations.

With those with whom we share 100% of their ancestors, determining who is related to us in their DNA Relative list is easy – it is everyone; however, it is possible to determine who may be related to us in the lists of relatives that share a fraction of their ancestors with us.

To do this, we must compare two relatives that are mutually related to us and each other and find out who they match and determine who matches us and who does not. The relationship of the two individuals must not be any closer than our relationship with either one. For example, I have two fourth cousins who have tested and who are siblings. Their relationship as siblings will produce matches from ancestries for which I have no relationship. Likewise, two other fourth cousins who are second cousins to each other cannot be compared either without producing matches to individuals with whom I could not possibly be related.

THE PROCEDURE

The procedure for doing this is to download DNA Relative files and compare these in Excel. Find two relatives that have at least an equal or greater distance from each other as your closest relative of the two does to you. Create a column in one of the files that you mark each row in the same manner; I used an “x” and give it a name like “ID.” Add a column to the second file and leave the column blank. Copy the second file and paste the rows below the first.

 
Once done, highlight the header row (Row 1). Click the “Data” tab and select “Filter” (or use Ctrl+Shift+L).  Highlight the "Name" column, select the drop down arrow and select “Sort A to Z.”  The names across both data sets will be in alphabetical order.



Next, highlight the “Name” column. Click the “Home” tab. Click “Conditional Formatting” and then “Highlight Cells Rules.” Select “Duplicate Values.” Some of your cells in this column should become highlighted in the default color – generally pink.



Open a new Excel file, copy the header row into it, and keep the file ready. Go back to the original file’s “Name” column and click the down arrow on the header. Select “Sort by Color” and select the color of the duplicates. This arranges all of the duplicates in alphabetical order. Copy all of the duplicates (highlighted in pink) and paste these into the new file. Now delete all of the rows that have something in the name field.

Repeat the process, with the “Family Surnames” column and then the “Family Locations” column. You can work through the rest of the copied databases, but unless you are very familiar with Excel, I would suggest not doing this as you will have the possibility of introducing non-matches in your final spreadsheet by accident. In the new spreadsheet, go to the “ID” column, select the down arrow and “Sort A to Z.” Delete all of the rows with no “x” in this column and that eliminates all of the duplicates in the spreadsheet.

What remains are a portion of those who the two relatives share and who may be related to you. You can paste your data onto this spreadsheet, and repeat the duplicate values, as you did with “Name,” for “Family Surnames” and “Family Locations.” If there are any duplicates, these are the individuals that you and your two cousins share. If it is a short list, the quickest method is to open your DNA Relatives page on 23andMe and use the search feature there.  Good luck. 

THE CAVEATS

There is no guarantee, however, that these individuals are actually related to you or related to you in the same lineage as to those with whom you are comparing. A case in point, my half cousin (through my father’s mother) is related to my fourth cousin (through my father’s father) with 14cM shared; however, they are related in an unknown manner and from different lines than I am related to either one of them.

My sister-in-law also shares 9cM of DNA with this same fourth cousin, albeit, it is via a completely different lineage. Both of these relatives share more with my documented 4th cousin that I do (5cM). Therefore, just because your two relatives have a common DNA cousin, this individual may not be related to you or if they are related to you, they may be related along another lineage.

There will be qualifying individuals that you will not be able to determine if they match both subjects.  I have determined these matches based on three criteria: name, family surnames, and family locations - these are the only three columns that will consequently prove the individual on both lists is the same.  Just because two individuals have the same haplogroups does not mean they are one and the same.  Therefore, we will miss some of our matching relatives by virtue of a lack of information.

Finally, we may match the same individuals; however, we may match on a different segments or even different chromosomes than than the other two individuals.  The closer the relationships, the more likely we will have matches on identical segments or portions of identical segments.

PUTTING IT INTO PRACTICE

I decided to put this into practice and here’s what I was able to discover with several of my family members who have tested.

First Cousins

I looked at the matches between my two maternal first cousins who are also first cousins with each other. They had 93 matching individuals with only six matching me. Because of my very close relationship to both of these women, I would venture to say that most if not all of these matches are my non DNA relatives. In comparing the 93 individuals with my mother, she matches all but 20 – these twenty relatives would never have been known to me had I not conducted this exercise.

Multiple Relationships/First Cousins Twice Removed

This is a fairly unique comparison as I am related to both individuals in two different ways; however, the lineages are the same for both. With subject “A,” I am her second cousin via her grandmother and her second cousin, once removed through her grandfather. To visualize this, our grandmothers were sisters. Her grandfather was my grandfather’s uncle. I share 50% of her ancestors and she shares 37.5% of mine in this unique relationship; together we share 5.34% of our DNA.

To Subject B, I am his second cousin, twice removed via his 2nd great grandmother and a third cousin, once removed through his 2nd great grandfather. I share 12.5% of his ancestors and together we share 2.68% of our DNA. Subjects A and B, who are first cousins, twice removed, share 12.5% of their ancestry and 3.64% of their DNA.

Confused yet? Good. This particular matching produced 19 matches and only three of whom I share any DNA at a significant level. The great difficulty in determining which side these relationships originate can only come through comparing to others that are related to me and these two cousins from one side of the family and not the other.

Fortunately, I have two individuals from each side of the family to do this comparison. Unfortunately, the two related to me through my paternal grandmother did not match these 19 people. The two subjects related to Subject A and B through my paternal grandfather’s mother each matched one individual. From the 19 matches, only two can be placed into a specific lineage. The others will require further research.

Half Cousin/Second Cousin

Since my father had half-sisters and no full siblings, I tested my half cousin who is also a second cousin to Subject A in the preceding example. Their match, which is completely New England Colonial, produced 17 individuals in common – seven of these match me.

Half Cousin/Second Cousin Once Removed

Using this same half cousin, I repeated the exercise with our common second cousin, once removed. His great-grandmother was the sister of our grandmother. By comparing the two individuals, I was able to determine that they had 12 individuals in common and four of these match me.

Second Cousins

Although we share no Colonial New England lines we have some Colonial New Jersey and Pennsylvania lineages, my common paternal second cousins produced a sizable number of matches. The three of us share great-grandparents, and we descend from three of their five children. These two cousins had 28 matches in common – four of which I shared. I would have expected that we would have shared more, but we don’t for some reason.

Second Cousin/Second Cousin Once Removed

The next comparison was between my second cousin and my second cousin, once removed.  They are second cousins, once removed and share a Colonial New England ancestry.  Both descend from my grandmother's two sisters.  The two share 10 matches in common with only three that matched me.

Second Cousins/Fourth Cousins, Once Removed

My second cousins and I share a fourth cousin, once removed, and since one of the second cousins shares a larger than normal amount with her, I thought it might prove interesting if we see how many they match; some of these individuals could be my relatives as well.

Needless to say, the pair matched four individuals. One of these is a known fourth cousin. Because I do not share any autosomal DNA with this fourth cousin, once removed, I have no matching DNA with any of the four – including the other fourth cousin – who also matches her. Of the 16 folks in our family study based on my surname lineage, only these two women match this fourth cousin, once removed.

THE POSSIBILITIES

What I’ve learned from this exercise is that I can possibly discover non-DNA relatives in 23andMe’s database. While there is always a chance that the matches that my cousins match are not related to me personally, there are those who would be.

While many of the matching segments are very small, I probably will not pursue some of these personally, as I’ve found that trying to determine relationships with the smaller matches has been an exercise in futility. While I will share with anyone who wants to share, I am currently only actively tracking individuals that share at least .20% of their genome with my family members.

This was a fun exercise and it renewed my interest in my 23andMe matches. I’ve become a little stagnant with my autosomal pursuits as of late. This effort has infused a bit more vigor in this regard. Try it and see if you can find possible non DNA relatives in the 23andMe database.

6 comments:

  1. Very interesting and informative treatise. Since your name shows up on my cousin list on both 23andme (5th) and familytreedna (4th), I wonder now many connections we will be able to confirm. I wish I could get as many of my relatives involved as you did. Keep up the good work.

    ReplyDelete
  2. Excellent ...Once again! When considering the X Chromosome Matches is it valid to believe that one long strand of say 30 to 40 cMs represents fewer recombinations then 40 or 50 cMs broken into 10 or 12 strands and therefore, depending on the Autosomal Matches, a closer time to MRCA.
    Warren Coleman

    ReplyDelete
  3. Thank you Jim for these excellent insights. Now a question ... or two ... This could mean that the Common Matches (ICW matches) of my Common Matches include some of my genealogical but not genetic cousins. I've tried creating a spreadsheet of the Common Matches of my Common Matches (and one can keep doing this ad infinitum) but the problem is knowing which ones are from the same SINGLE Common Ancestor. The next step might be an arduous process of tracking overlapping segments between the list of people thus generated to minimise (but not eradicate) the possibility of 2 or more shared Common Ancestors among the group ... but the more rounds of matching one does, the more it may skew the results. Any thoughts or recommendations? Thanks again for your penetrating perspicacity!

    ReplyDelete
  4. Hi Jim, I mentioned your blog post at my presentation at I4GG in Washington this weekend. Hopefully it will generate some interesting debate. Best, Maurice

    ReplyDelete
    Replies
    1. Maurice: Thanks so much. I was in the process of starting a new job that week and missed this comment. I appreciate your efforts. Thanks again.

      Delete