Saturday, December 1, 2018

Understanding FTDNA's New Big Y-500 Differences Column


During this past week, Family Tree DNA has added a column to the Y-DNA Matches feature called “Big Y-500 STR Differences.”  There has been much said about this column, and there is a great deal of confusion as to what it means. I’ve seen a few argue a point that is different than the one I espouse. Hopefully, by the end of this post, we can agree on this new set of data.  

Background

The Owston/Ouston DNA project has a total of 33 Y-DNA participants with 16 having taken a Big Y-500 test. The Big Y-500 participants range in relationship of a second cousin pair to an estimated 13th cousin, twice removed pair. Most relationships fall between eighth cousins and ninth cousins, once removed.  Our family charts can be found at http://www.owston.com/family/owston/Owston_Family_Charts.pdf.

A truncated version of my personal Y-111 report with Big Y-500 data appears below and shows the genetic distance at 111 markers.  I have removed all duplicate information and personal identifications. 


The STR Differences Column

The Big Y-500 STR Differences column is spurring all the recent interest. The higher number is obviously the number of markers beyond 111 that can be compared. These are the markers where neither compared participant has a no-call.  It is the smaller number, however, that is generating a bit of disagreement.

Some believe this smaller number is the genetic distance for the markers beyond 111; however, it is not. When looking at the raw data for all matches in a project, one can deduce that this number is not the genetic distance. 

The belief that this is genetic distance is because the number will mimic the infinite alleles model when there is only a one-step difference per mismatched marker. This is what is causing the confusion. Just because it looks like a duck and waddles like a duck, it might be a goose.

What is it then?  The column simply gives the user the opportunity to see how many of the comparable markers (the larger number) and the number of those markers that differ (the smaller number). 

When I compare the actual genetic distance with the number in the Big Y-500 STR Differences column for all 120 relationships, only 52 have the same number for genetic distance beyond Y-111 and for the Big Y-500 STR Differences.  The remaining 68 (56.7%) have larger numbers. 

I have provided the data in a PDF file on my website. The rows in lavender are those where the post 111 marker genetic distances and the STR differences columns do not match.

What about Genetic Distance?

When combining the genetic distance from both sets of markers (Y-111 and the 112-561), the results are all over the road.  I’ve seen this at 37, 67, and 111 markers as well. The greatest GD for both markers occurs for a pair of seventh cousins and a pair of eighth cousins. Both pairs exhibit a GD of 21. A GD of 8 has a relationship range of second cousins, once removed to thirteenth cousins, once removed and everything in between.  Genetic Distance is a poor indicator of relationship, as mutations occur randomly.

Compared Numbers

As far as the compared markers (larger number in this column), our project has a range of 364 (ninth cousins) to two pairs with 444 (8C1R and an estimated 13C1R) for the possible 450 additional markers. The mean number of usable markers is 418, while the median is 427 and the mode is 435.

The Overall Importance of this Data

How important is this data? This remains for you to discover in your own family project. As for me, the additional STRs have not provided much additional detail for our family. Of all the additional 450 markers, only three are line specific. 

DYS631=11 (modal 10) is indicative of the Cobourg line; however, so is DYS643=11 (modal 10) found in the first 111, as well as the A10921 and A10923 SNPs.  

FTY510=10 (modal 9) is a signature marker for the Thornholme family, but so is DYS481=25 (modal 26) found in the first 67 markers, as well as the I-A15739 and I-A15740 SNPs. 

DYS489=13 (modal 12) is probably the most valuable of the three, as it is a defining STR marker for the Ganton Branch. While there are two line specific markers for the Rillington Builders Line in the first 111, there is no other Ganton Branch specific STRs besides DYS489. The Ganton Branch is also identified by the I-A10208 SNP.   

Of the 450 markers, 147 exhibit no-calls. There are 260 no-calls in total in our project. Twenty-four of the markers have at least one mutation present. Sixteen only exhibit one person experiencing a mutation among the markers’ results. 

The Real Value of the Big Y-500

As I said earlier in the year, the greater value in the Big Y-500 are the SNPs. For our family, the Big Y-500 cleared up three issues:

  • It provided additional evidence that a spurious male was descended from a specific progenitor.
  • It allowed us to determine which of two men with the same name was the ancestral father for a line of descent.
  • It aided in correcting a mistake in our own genealogical research that occurred thirty years ago. It helped us revisit the documentation of a family in question, and in doing so, this documentation provided the same answers as were found among four matching SNPs.
My experiences may be different than yours and I am hoping that you will find the additional STRs helpful. Remember, the Big Y-500 STR Differences column is not a record of genetic distance, but it is rather a number of markers where a mismatch occurs. 

 

Addendum


I was alerted by a reader that Family Tree DNA had already posted an explanation of this column.  Their explanation, which agrees with the above, is found below: 


"In the matches section, the Big Y-500 STR Differences column is now displayed between Genetic Distance and Name columns.
Understanding the Big Y-500 STR Differences Column This column displays the mismatch number and the number of comparable Big Y-500 STR markers between the kit and a match.
Let us say that for a match 2 of 395 is displayed in this column:
• 395 is the number of comparable markers between the kit and the match. In other words, both the kit and the match have STR values on 395 of the same Big Y-500 STRs. Note: On the CSV file, this value is displayed in the Big Y-500 STRs Compared column.
• 2 is the mismatch number. In other words, out of the 395 Big Y-500 STRs on which the kit and the match have values, there are 2 markers for which the kit and the match has a different value. Note: On the CSV file, this value is displayed in the Big Y-500 STR Differences column."



4 comments:

  1. Thank you for your work, and for presenting it so well that even a relative novice can understand it.

    ReplyDelete
    Replies
    1. Glad to be of help and thanks for your positive comments, Mike.

      Jim

      Delete
  2. I have one with 7 differences out of 432 STR markers - what does that mean, if anything, about the genetic closeness between my self and that individual - at 67 we show a GD of 5?

    ReplyDelete
    Replies
    1. I wouldn't worry about it. I don't find a lot of use for the newer markers. What was your match at 111?

      Delete